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Preface 


Claude Shannon, the father of Information Theory, described the fundamental 
problem of point-to-point communications in his classic 1948 paper as “that of 
reproducing at one point either exactly or approximately a message selected at 
another point.” How engineers solve this problem is the subject of this book. 
But unlike Shannon’s general problem, where the message can be an image, a 
sound clip, or a movie, here we restrict ourselves to bits. We thus envision that 
the original message is either a binary sequence to start with, or else that it was 
described using bits by a device outside our control and that our job is to reproduce 
the describing bits with high reliability. The issue of how images or text files are 
converted efficiently into bits is the subject of lossy and lossless data compression 
and is addressed in texts on information theory and on quantization. 


The engineering solutions to the point-to-point communication problem greatly 
depend on the available resources and on the channel between the points. They 
typically bring together beautiful techniques from Fourier Analysis, Hilbert Spaces, 
Probability Theory, and Decision Theory. The purpose of this book is to introduce 
the reader to these techniques and to their interplay. 


The book is intended for advanced undergraduates and beginning graduate stu- 
dents. The key prerequisites are basic courses in Calculus, Linear Algebra, and 
Probability Theory. A course in Linear Systems is a plus but not a must, because 
all the results from Linear Systems that are needed for this book are summarized 
in Chapters 5 and 6. But more importantly, the book requires a certain mathemat- 
ical maturity and patience, because we begin with first principles and develop the 
theory before discussing its engineering applications. The book is for those who 
appreciate the views along the way as much as getting to the destination; who like 
to “stop and smell the roses;” and who prefer fundamentals to acronyms. I firmly 
believe that those with a sound foundation can easily pick up the acronyms and 
learn the jargon on the job, but that once one leaves the academic environment, 
one rarely has the time or peace of mind to study fundamentals. 


In the early stages of the planning of this book I took a decision that greatly 
influenced the project. I decided that every key concept should be unambiguously 
defined; that every key result should be stated as a mathematical theorem; and 
that every mathematical theorem should be correct. This, I believe, makes for 
a solid foundation on which one can build with confidence. But it is also a tall 
order. It required that I scrutinize each “classical” result before I used it in order 
to be sure that I knew what the needed qualifiers were, and it forced me to include 
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background material to which the reader may have already been exposed, because 
I needed the results “done right.” Hence Chapters 5 and 6 on Linear Systems and 
Fourier Analysis. This is also partly the reason why the book is so long. When I 
started out my intention was to write a much shorter book. But I found that to do 
justice to the beautiful mathematics on which Digital Communications is based I 
had to expand the book. 


Most physical layer communication problems are at their core of a continuous- 
time nature. The transmitted physical waveforms are functions of time and not 
sequences synchronized to a clock. But most solutions first reduce the problem to a 
discrete-time setting and then solve the problem in the discrete-time domain. The 
reduction to discrete-time often requires great ingenuity, which I try to describe. 
It is often taken for granted in courses that open with a discrete-time model from 
Lecture 1. I emphasize that most communication problems are of a continuous- 
time nature, and that the reduction to discrete-time is not always trivial or even 
possible. For example, it is extremely difficult to translate a peak-power constraint 
(stating that at no epoch is the magnitude of the transmitted waveform allowed to 
exceed a given constant) to a statement about the sequence that is used to represent 
the waveform. Similarly, in Wireless Communications it is often very difficult to 
reduce the received waveform to a sequence without any loss in performance. 


The quest for mathematical precision can be demanding. I have therefore tried to 
precede the statement of every key theorem with its gist in plain English. Instruc- 
tors may well choose to present the material in class with less rigor and direct the 
students to the book for a more mathematical approach. I would rather have text- 
books be more mathematical than the lectures than the other way round. Having 
a rigorous textbook allows the instructor in class to discuss the intuition knowing 
that the students can obtain the technical details from the book at home. 


The communication problem comes with a beautiful geometric picture that I try 
to emphasize. To appreciate this picture one needs the definition of the inner 
product between energy-limited signals and some of the geometry of the space of 
energy-limited signals. These are therefore introduced early on in Chapters 3 and 4. 
Chapters 5 and 6 cover standard material from Linear Systems. But note the early 
introduction of the matched filter as a mechanism for computing inner products 
in Section 5.8. Also key is Parseval’s Theorem in Section 6.2.2 which relates the 
geometric pictures in the time domain and in the frequency domain. 


Chapter 7 deals with passband signals and their baseband representation. We em- 
phasize how the inner product between passband signals is related to the inner 
product between their baseband representations. This elegant geometric relation- 
ship is often lost in the haze of various trigonometric identities. While this topic is 
important in wireless applications, it is not always taught in a first course in Digital 
Communications. Instructors who prefer to discuss baseband communication only 
can skip Chapters 7, 9, 16, 17, 18, 24 27, and Sections 26.10 and 28.5. But it would 
be a shame. 


Chapter 8 presents the celebrated Sampling Theorem from a geometric perspective. 
It is inessential to the rest of the book but is a striking example of the geometric 
approach. Chapter 9 discusses the Sampling Theorem for passband signals. 
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Chapter 10 discusses modulation. I have tried to motivate Linear Modulation 
and Pulse Amplitude Modulation and to minimize the use of the “that’s just how 
it is done” argument. The use of the Matched Filter for detecting (here in the 
absence of noise) is emphasized. This also motivates the Nyquist Theory, which is 
treated in Chapter 11. I stress that the motivation for the Nyquist Theory is not 
to avoid inter-symbol interference at the sampling points but rather to guarantee 
the orthogonality of the time shifts of the pulse shape by integer multiples of the 
baud period. This ultimately makes more engineering sense and leads to cleaner 
mathematics: compare Theorem 11.3.2 with its corollary, Corollary 11.3.4. 


The result of modulating random bits is a stochastic process, a concept which is 
first encountered in Chapter 10; formally defined in Chapter 12; and revisited in 
Chapters 13, 17, and 25. It is an important concept in Digital Communications, 
and I find it best to first introduce man-made synthesized stochastic processes 
(as the waveforms produced by an encoder when fed random bits) and only later 
to introduce the nature-made stochastic processes that model noise. Stationary 
discrete-time stochastic processes are introduced in Chapter 13 and their complex 
counterparts in Chapter 17. These are needed for the analysis in Chapter 14 of the 
power in Pulse Amplitude Modulation and for the analysis in Chapter 17 of the 
power in Quadrature Amplitude Modulation. 


I emphasize that power is a physical quantity that is related to the time-averaged 
energy in the continuous-time transmitted power. Its relation to the power in the 
discrete-time modulating sequence is a nontrivial result. In deriving this relation 
I refrain from adding random timing jitters that are often poorly motivated and 
that turn out to be unnecessary. (The transmitted power does not depend on the 
realization of the fictitious jitter.) The Power Spectral Density in Pulse Amplitude 
Modulation and Quadrature Amplitude Modulation is discussed in Chapters 15 
and 18. The discussion requires a definition for Power Spectral Density for non- 
stationary processes (Definitions 15.3.1 and 18.4.1) and a proof that this definition 
coincides with the classical definition when the process is wide-sense stationary 
(Theorem 25.14.3). 


Chapter 19 opens the second part of the book, which deals with noise and detection. 
It introduces the univariate Gaussian distribution and some related distributions. 
The principles of Detection Theory are presented in Chapters 20-22. I emphasize 
the notion of Sufficient Statistics, which is central to Detection Theory. Building 
on Chapter 19, Chapter 23 introduces the all-important multivariate Gaussian 
distribution. Chapter 24 treats the complex case. 


Chapter 25 deals with continuous-time stochastic processes with an emphasis on 
stationary Gaussian processes, which are often used to model the noise in Digital 
Communications. This chapter also introduces white Gaussian noise. My approach 
to this topic is perhaps new and is probably where this text differs the most from 
other textbooks on the subject. 


I define white Gaussian noise of double-sided power spectral density No /2 
with respect to the bandwidth W as any measurable,! stationary, Gaussian 
stochastic process whose power spectral density is a nonnegative, symmetric, inte- 


1This book does not assume any Measure Theory and does not teach any Measure Theory. 
(I do define sets of Lebesgue measure zero in order to be able to state uniqueness theorems.) I 
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Figure 1: The power spectral density of a white Gaussian noise process of double- 
sided power spectral density No/2 with respect to the bandwidth W. 


grable function of frequency that is equal to No/2 at all frequencies f satisfying 
|f| < _W. The power spectral density at other frequencies can be arbitrary. An 
example of the power spectral density of such a process is depicted in Figure 1. 
Adopting this definition has a number of advantages. The first is, of course, that 
such processes exist. One need not discuss “generalized processes,” Gaussian pro- 
cesses with infinite variances (that, by definition, do not exist), or introduce the 
Ité calculus to study stochastic integrals. (Stochastic integrals with respect to the 
Brownian motion are mathematically intricate and physically unappealing. The 
idea of the noise having infinite power is ludicrous.) The above definition also frees 
me from discussing Dirac’s Delta, and, in fact, Dirac’s Delta is never used in this 
book. (A rigorous treatment of Generalized Functions is beyond the engineering 
curriculum in most schools, so using Dirac’s Delta always gives the reader the 
unsettling feeling of being on unsure footing.) 


The detection problem in white Gaussian noise is treated in Chapter 26. No course 
in Digital Communications should end without Theorem 26.4.1. Roughly speak- 
ing, this theorem states that if the mean-signals are bandlimited to W Hz and if 
the noise is white Gaussian noise with respect to the bandwidth W, then the inner 
products between the received signal and the mean-signals form a sufficient statis- 
tic. Numerous examples as well as a treatment of colored noise are also discussed 
in this chapter. Extensions to noncoherent detection are addressed in Chapter 27 
and implications for Pulse Amplitude Modulation and for Quadrature Amplitude 
Modulation in Chapter 28. 


The book concludes with Chapter 29, which introduces Coding. It emphasizes how 
the code design influences the transmitted power, the transmitted power spectral 
density, the required bandwidth, and the probability of error. The construction of 
good codes is left to texts on Coding Theory. 


use Measure Theory only in stating theorems that require measurability assumptions. This is 
in line with my attempt to state theorems together with all the assumptions that are required 
for their validity. I recommend that students ignore measurability issues and just make a mental 
note that whenever measurability is mentioned there is a minor technical condition lurking in the 
background. 
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Basic Latin 


Mathematics sometimes reads like a foreign language. I therefore include here a 
short glossary for such terms as “i.e.,” “that is,” “in particular,” “a fortiori,” “for 
example,” and “e.g.,” whose meaning in Mathematics is slightly different from the 
definition you will find in your English dictionary. In mathematical contexts these 
terms are actually logical statements that the reader should verify. Verifying these 
statements is an important way to make sure that you understand the math. 


What are these logical statements? First note the synonym “i.e.” = “that is” and 
the synonym “e.g.” = “for example.” Next note that the term “that is” often 
indicates that the statement following the term is equivalent to the one preceding 
it: “We next show that p is a prime, i.e., that p is a positive integer that is not 
divisible by any number other than one and itself.” The terms “in particular” 
or “a fortiori” indicate that the statement following them is implied by the one 
preceding them: “Since g(-) is differentiable and, a fortiori, continuous, it follows 
from the Mean Value Theorem that the integral of g(-) over the interval [0,1] is 
equal to g(€) for some € € [0,1].” The term “for example” can have its regular 
day-to-day meaning but in mathematical writing it also sometimes indicates that 
the statement following it implies the one preceding it: “Suppose that the function 
g(-) is monotonically nondecreasing, e.g., that it is differentiable with a nonnegative 
derivative.” 


Another important word to look out for is “indeed,” which in this book typically 
signifies that the statement just made is about to be expanded upon and explained. 
So when you read something that is unclear to you, be sure to check whether the 
next sentence begins with the word “indeed” before you panic. 


The Latin phrases “a priori” and “a posteriori” show up in Probability Theory. 
The former is usually associated with the unconditional probability of an event and 
the latter with the conditional. Thus, the “a priori” probability that the sun will 
shine this Sunday in Zurich is 25%, but now that I know that it is raining today, 
my outlook on life changes and I assign this event the a posteriori probability of 
15%. 


The phrase “prima facie” is roughly equivalent to the phrase “before any further 
mathematical arguments have been presented.” For example, the definition of the 
projection of a signal v onto the signal u as the vector w that is collinear with u and 
for which v — w is orthogonal to u, may be followed by the sentence: “Prima facie, 
it is not clear that the projection always exists and that it is unique. Nevertheless, 
as we next show, this is the case.” 


Syllabuses or Syllabi 


The book can be used as a textbook for a number of different courses. For a course 
that focuses on deterministic signals one could use Chapters 1-9 & Chapter 11. 
A course that covers Stochastic Processes and Detection Theory could be based 
on Chapter 12 and Chapters 19-26 with or without discrete-time stochastic pro- 
cesses (Chapter 13) and with or without complex random variables and processes 
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(Chapters 17 & 24). 


For a course on Digital Communications one could use the entire book or, if time 
does not permit it, discuss only baseband communication. In the latter case one 
could omit Chapters 7, 9, 16, 17, 18, 24, 27, and Section 28.5, 


The dependencies between the chapters are depicted on Page xxiii. 


A web page for this book can be found at 


www.afoundationindigitalcommunication.ethz.ch 
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A Dependency Diagram. 
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Chapter 1 


Some Essential Notation 


Reading a whole chapter about notation can be boring. We have thus chosen to 
collect here only the essentials and to introduce the rest when it is first used. The 
“List of Symbols” on Page 704 is more comprehensive. 


We denote the set of complex numbers by C, the set of real numbers by R, the set 
of integers by Z, and the set of natural numbers (positive integers) by N. Thus, 


N={neZ:n2>}]}. 


The above equation is not meant to belabor the point. We use it to introduce the 
notation 


{x € A: statement} 


for the set consisting of all those elements of the set A for which “statement” holds. 


In treating real numbers, we use the notation (a,b), [a,b), [a,b], (a, b] to denote 
open, half open on the right, closed, and half open on the left intervals of the real 
line. Thus, for example, 


(a,b) ={teER:a<a<bd}. 


A statement followed by a comma and a condition indicates that the statement 
holds whenever the condition is satisfied. For example, 


lan —al <6, n>Nno 


means that |an — a| < € whenever n > no. 


We use I{statement} to denote the indicator of the statement. It is equal to 1, if 
the statement is true, and it is equal to 0, if the statement is false. Thus 


1 if statement is true, 
0 if statement is false. 


I{statement} = 


2 Some Essential Notation 


In dealing with complex numbers we use i to denote the purely imaginary unit- 
magnitude complex number 
i=vV-l. 


We use z* to denote the complex conjugate of z, we use Re(z) to denote the real 
part of z, we use Im(z) to denote the imaginary part of z, and we use |z| to denote 
the absolute value (or “modulus”, or “complex magnitude” ) of z. Thus, if z = a+ib, 
where a,b € R, then z* = a — ib, Re(z) = a, Im(z) = 6, and |z| = Va? + 0?. 


The notation used to define functions is extremely important and is, alas, some- 
times confusing to students, so please pay attention. A function or a mapping 
associates with each element in its domain a unique element in its range. If a 
function has a name, the name is often written in bold as in u.! Alternatively, we 
sometimes denote a function u by u(-). The notation 


u:A-B 


indicates that u is a function of domain A and range B. The rule specifying for 
each element of the domain the element in the range to which it is mapped is often 
written to the right or underneath. Thus, for example, 


u: R = (—5, oo), treet? 


indicates that the domain of the function u is the reals, that its range is the set 
of real numbers that exceed —5, and that u associates with t the nonnegative 
number t?. We write u(t) for the result of applying the mapping u to t. The 
image of a mapping u: A — B is the set of all elements of the range B to which 
at least one element in the domain is mapped by u: 


(image of uw: AB) = {u(z): ce A}. (1.1) 


The image of a mapping is a subset of its range. In the above example, the image 
of the mapping is the set of nonnegative reals [0,0oo). A mapping u: A — B is said 
to be onto (or surjective) if its image is equal to its range. Thus, u: A — B is 
onto if, and only if, for every y € B there corresponds some x € A (not necessarily 
unique) such that u(#) = y. If the image of g(-) is a subset of the domain of 
h(-), then the composition of g(-) and h(-) is the mapping «+ h(g(x)), which is 
denoted by hog. 


Sometimes we do not specify the domain and range of a function if they are clear 
from the context. Thus, we might write u: t + v(t) cos(2af.t) without making 
explicit what the domain and range of u are. In fact, if there is no need to give a 
function a name, then we will not. For example, we might write t +> v(t) cos(27 fet) 
to designate the unnamed function that maps t to u(t) cos(27f-t). (Here v(-) is 
some other function, which was presumably defined before.) 


If the domain of a function u is R and if the range is R, then we sometimes say 
that u is a real-valued signal or a real signal, especially if the argument of u 


1But some special functions such as the self-similarity function Rgg, the autocovariance func- 
tion Kxx, and the power spectral density Sxx, which will be introduced in later chapters, are 
not in boldface. 
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stands for time. Similarly we shall sometimes refer to a function u: R— Casa 
complex-valued signal or a complex signal. If we refer to u as a signal, then 
the question whether it is complex-valued or real-valued should be clear from the 
context, or else immaterial to the claim. 


We caution the reader that, while u and u(-) denote functions, u(t) denotes the 
result of applying u to t. If u is a real-valued signal then u(t) is a real number! 


Given two signals u and v we define their superposition or sum as the signal 
tr u(t) + v(t). We denote this signal by u+v. Also, if a € C and u is any signal, 
then we define the amplification of u by a as the signal t +> au(t). We denote 
this signal by au. Thus, 

au+ Bv 


is the signal 
tr au(t) + Bo(t). 


We refer to the function that maps every element in its domain to zero as the all- 
zero function and we denote it by 0. The all-zero signal 0 maps every t € R 
to zero. If x: R > C is a signal that maps every t € R to x(t), then its reflection 
or mirror image is denoted by x and is the signal that is defined by 


XK: th x£(-t). 


Dirac’s Delta (which will hardly be mentioned in this book) is not a function. 


A probability space is defined as a triplet (0,7, P), where the set 2 is the set of 
experiment outcomes, the elements of the set F are subsets of Q and are called 
events, and where P: F — [0,1] assigns probabilities to the various events. It is 
assumed that F forms a o-algebra, i.e., that Q € F; that if a set is in F then so 
is its complement (with respect to Q); and that every finite or countable union of 
elements of F is also an element of F. A random variable X is a mapping from Q 
to R that satisfies the technical condition that 


{wEQ: X(w) <EF EF, EER. (1.2) 


This condition guarantees that it is always meaningful to evaluate the probability 
that the value of X is smaller or equal to €. 


Chapter 2 


Signals, Integrals, and Sets of Measure Zero 


2.1 Introduction 


The purpose of this chapter is not to develop the Lebesgue theory of integration. 
Mastering this theory is not essential to understanding Digital Communications. 
But some concepts from this theory are needed in order to state the main results 
of Digital Communications in a mathematically rigorous way. In this chapter 
we introduce these required concepts and provide references to the mathematical 
literature that develops them. 


The less mathematically-inclined may gloss over most of this chapter. Readers 
who interpret the integrals in this book as Riemann integrals; who interpret “mea- 
surable” as “satisfying a minor mathematical restriction”; who interpret “a set of 
Lebesgue measure zero” as “a set that is so small that integrals of functions are 
not sensitive to the value the integrand takes in this set”; and who swap orders of 
summations, expectations and integrations fearlessly will not miss any engineering 
insights. 

But all readers should pay attention to the way the integral of complex-valued 
signals is defined (Section 2.3); to the basic inequality (2.13); and to the notation 
introduced in (2.6). 


2.2 Integrals 


Recall that a real-valued signal u is a function u: R — R. The integral of u is 
denoted by 

/ u(t) dt. (2.1) 
For (2.1) to be meaningful some technical conditions must be met. (You may re- 
call from your calculus studies, for example, that not every function is Riemann 
integrable.) In this book all integrals will be understood to be Lebesgue integrals, 
but nothing essential will be lost on readers who interpret them as Riemann inte- 
grals. For the Lebesgue integral to be defined the integrand u must be a Lebesgue 
measurable function. Again, do not worry if you have not studied the Lebesgue 
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integral or the notion of measurable functions. We point this out merely to cover 
ourselves when we state various theorems. Also, for the integral in (2.1) to be 
defined we insist that ee 

/ |u(t)| dt < oo. (2.2) 
(There are ways of defining the integral in (2.1) also when (2.2) is violated, but 
they lead to fragile expressions that are difficult to manipulate.) 


A function u: R — R which is Lebesgue measurable and which satisfies (2.2) is 
said to be integrable, and we denote the set of all such functions by £;. We shall 
refrain from integrating functions that are not elements of Ly. 


2.3. Integrating Complex-Valued Signals 


This section should assuage your fear of integrating complex-valued signals. (Some 
of you may have a trauma from your Complex Analysis courses where you dealt 
with integrals of functions from the complex plane to the complex plane. Here 
things are much simpler because we are dealing only with integrals of functions 
from the real line to the complex plane.) We formally define the integral of a 
complex-valued function u: R — C by 


i. u(t) dt + i Re(u(t)) dt + if Im(u(t)) de. (2:3) 


—oo —oco —co 


For this to be meaningful, we require that the real functions t + Re(u(t)) and 
t + Im(u(t)) both be integrable real functions. That is, they should both be 
Lebesgue measurable and we should have 


/ IRe(u(t))|dt <oo and / lim(u(t)) | dt < oo. (2.4) 
It is not difficult to show that (2.4) is equivalent to the more compact condition 
/ |u(t)| dt < oo. (255) 


We say that a complex signal u: R — C is Lebesgue measurable if the mappings 
t+ Re(u(t)) and t+ Im(u(t)) are Lebesgue measurable real signals. We say that 
a function u: R — C is integrable if it is Lebesgue measurable and (2.4) holds. 
The set of all Lebesgue measurable integrable complex signals is denoted by L£,. 
Note that we use the same symbol £;, to denote both the set of integrable real 
signals and the set of integrable complex signals. To which of these two sets we 
refer should be clear from the context, or else immaterial. 


For u € £; we define ||ul|, as 


Jul, = i |u(t)| de. (2.6) 
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Before summarizing the key properties of the integral of complex signals we remind 
the reader that if u and v are complex signals and if a, 3 are complex numbers, then 
the complex signal au + (v is defined as the complex signal t > au(t)+ Gu(t). The 
intuition for the following proposition comes from thinking about the integrals as 
Riemann integrals, which can be approximated by finite sums and by then invoking 
the analogous results about finite sums. 


Proposition 2.3.1 (Properties of Complex Integrals). Let the complex signals u,v 
be in Ly, and let a, 2 be arbitrary complex numbers. 


(i) Integration is linear in the sense that ou+ Bv € Ly and 


i (wu(t) + Bv(t)) dt = af” u(t) dt + af u(t) dt. (2.7) 


—oo — 


(tt) Integration commutes with complex conjugation 


ie u*(t) dt = ([- u(t) at). (2.8) 


(iti) Integration commutes with the operation of taking the real part 


Re vs u(t) ar) = [. Re(u(t)) dé. (2.9) 


—Cco —oco 


(iv) Integration commutes with the operation of taking the imaginary part 


im( [ u(t) ar) = i Im(u(t)) dé. (2.10) 


—Co —oCo 


Proof. For a proof of (i) see, for example, (Rudin, 1974, Theorem 1.32). The rest 
of the claims follow easily from the definition of the integral of a complex-valued 
signal (2.3). oO 


2.4 An Inequality for Integrals 


Probably the most important inequality for complex numbers is the Triangle 
Inequality for Complex Numbers 


Jw+z|)<|wl|t|z|, w,z€C. (2.11) 


This inequality extends by induction to finite sums: 


n 


4 


iF 


Soil Agee €€. (2.12) 
j=l 


The extension to integrals is the most important inequality for integrals: 
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Proposition 2.4.1. For every complex-valued or real-valued signal u in Ly 


fe u(t) a < ie |u(t) | de. (2.13) 


—oo 


Proof. See, for example, (Rudin, 1974, Theorem 1.33). 


Note that in (2.13) we should interpret | -| as the absolute-value function if u is a 
real signal, and as the modulus function if u is a complex signal. 


Another simple but useful inequality is 
Ju+vil,<llull,+iivll,, uve Lz, (2.14) 


which can be proved using the calculation 


ut vi, 2 flute) +ocolae 


—cCo 


ei, * (lu(t)| + lo(t)]) at 


aie jucgjae+ f lo(eae 


= lull, + IIvili 


where the inequality follows by applying the Triangle Inequality for Complex Num- 
bers (2.11) with the substitution of u(t) for w and v(t) for z. 


2.5 Sets of Lebesgue Measure Zero 


It is one of life’s minor grievances that the integral of a nonnegative function can 
be zero even if the function is not identically zero. For example, t > I{t = 17} isa 
nonnegative function whose integral is zero and which is nonetheless not identically 
zero (it maps 17 to one). In this section we shall derive a necessary and sufficient 
condition for the integral of a nonzero function to be zero. This condition will 
allow us later to state conditions under which various integral inequalities hold 
with equality. It will give mathematical meaning to the physical intuition that if 
the waveform describing some physical phenomenon (such as voltage over a resistor) 
is nonnegative and integrates to zero then “for all practical purposes” the waveform 
is zero. 


We shall define sets of Lebesgue measure zero and then show that a nonnegative 
function u: R = [0, co) integrates to zero if, and only if, the set {t € R: u(t) > 0} 
is of Lebesgue measure zero. We shall then introduce the notation u = v to indicate 
that the set {t € R: u(t) £ v(t)} is of Lebesgue measure zero. 


It should be noted that since the integral is unaltered when the integrand is changed 
at a finite (or countable) number of points, it follows that any nonnegative function 
that is zero except at a countable number of points integrates to zero. The reverse, 
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however, is not true. One can find nonnegative functions that integrate to zero 
and that are nonzero on an uncountable set of points. 


The less mathematically inclined readers may skip the mathematical definition of 
sets of measure zero and just think of a subset of the real line as being of Lebesgue 
measure zero if it is so “small” that the integral of any function is unaltered when 
the values it takes in the subset are altered. Such readers should then think of the 
statement u = v as indicating that u — v is just the result of altering the all-zero 
signal 0 on a set of Lebesgue measure zero and that, consequently, 


i u(t) — v(t)| dt = 0. 


Definition 2.5.1 (Sets of Lebesgue Measure Zero). We say that a subset N of 
the real line R is a set of Lebesgue measure zero (or a Lebesgue null set) 
if for every € > 0 we can find a sequence of intervals {a,, bi], [az, b2],... such that 
the total length of the intervals is smaller than or equal to € 
S (bj — aj) Se (2.15a) 
j=l 
and such that the union of the intervals cover the set N 


NC (a1, by] U [az, bg] U--- : (2.15b) 


As an example, note that the set {1} is of Lebesgue measure zero. Indeed, it is 
covered by the single interval [1 — €/2,1+ €/2] whose length is e. Similarly, any 
finite set is of Lebesgue measure zero. Indeed, the set {a1,...,@,} can be covered 
by n intervals of total length not exceeding € as follows: 


f{ay,...,Qn} C [ay — €/(2n),a1 + €/(2n)] U-++U [an — €/(2n), an + €/(2n)]. 


This argument can be also extended to show that any countable set is of Lebesgue 
measure zero. Indeed the countable set {a1, a2,...} can be covered as 


101 Oy aay GS U plot tego I 


where we note that the length of the interval [a; — 2-J~1e,aj +277~1e] is 2-Je, 
which when summed over 7 yields e. 


With a similar argument one can show that the union of a countable number of 
sets of Lebesgue measure zero is of Lebesgue measure zero. 


The above examples notwithstanding, it should be emphasized that there exist sets 
of Lebesgue measure zero that are not countable.! Thus, the concept of a set of 
Lebesgue measure zero is different from the concept of a countable set. 


Loosely speaking, we say that two signals are indistinguishable if they agree except 
possibly on a set of Lebesgue measure zero. We warn the reader, however, that 
this terminology is not standard. 


1For example, the Cantor set is of Lebesgue measure zero and uncountable; see (Rudin, 1976, 
Section 11.11, Remark (f), p. 309). 
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Definition 2.5.2 (Indistinguishable Functions). We say that the Lebesgue measur- 
able functions u,v from R to C (or to R) are indistinguishable and write 


u=v 
if the set {t © R: u(t) £ v(t)} ts of Lebesque measure zero. 


Note that u = v if, and only if, the signal u — v is indistinguishable from the 
all-zero signal O 
(u=v) @ (u-v=0). (2.16) 


The main result of this section is the following: 


Proposition 2.5.3. 


(i) A nonnegative Lebesgue measurable signal integrates to zero if, and only if, 
it is indistinguishable from the all-zero signal 0. 


(i) If u,v are Lebesgue measurable functions from R to C (or to R), then 


(flu) -e@|ar=0) # (uv) (2.17) 
and 
([ lu) -oPat=0) & (wy). (2.18) 


(itt) If u and v are integrable and indistinguishable, then their integrals are equal: 


wayo(f 


—Co 


ujat= f v(t)at), u,v €Ly. (2.19) 


Proof. The proof of (i) is not very difficult, but it requires more familiarity with 
Measure Theory than we are willing to assume. The interested reader is thus 
referred to (Rudin, 1974, Theorem 1.39). 


The equivalence in (2.17) follows by applying Part (i) to the nonnegative function 
tr |u(t) — v(t)|. Similarly, (2.18) follows by applying Part (i) to the nonnegative 
function t + |u(t)—v(t)|? and by noting that the set of t’s for which |u(t)—v(t)|? 4 0 
is the same as the set of t’s for which u(t) 4 v(t). 


Part (iii) follows from (2.17) by noting that 


[fu arf = v(¢) at = fw (t)) dt 
Fn (t)| dé, 


where the first equality follows by the linearity of integration, and where the sub- 
sequent inequality follows from Proposition 2.4.1. 
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2.6 Swapping Integration, Summation, and Expectation 


In numerous places in this text we shall swap the order of integration as in 


i (x u(a, 6) aa) dg = a (z u(a, 8) a) da (2.20) 


or the order of summation as in 


¥ ok = ees) (2.21) 
v=1 \n=1 n=l Sv=1 
or the order of summation and integration as in 
i ( aut) dt=S> («. [ u,(t) ar) (2.22) 
TOO NM. v=1 — eo 
or the order of integration and expectation as in 
E / X u(t) a = ‘| E[Xu(t)] dt = E(x] f u(t) dt. 


These changes of order are usually justified using Fubini’s Theorem, which states 
that these changes of order are permissible provided that a very technical measura- 
bility condition is satisfied and that, in addition, either the integrand is nonnegative 
or that in some order (and hence in all orders) the integrals/summation/expectation 
of the absolute value of the integrand is finite. 


For example, to justify (2.20) it suffices to verify that the function u: R? — R in 
(2.20) is Lebesgue measurable and that, in addition, it is either nonnegative or 


[(fmeo.s)ide) a8 < 20 
ie és ju(a, 8) 48) da < ov. 


Similarly, to justify (2.21) it suffices to show that a, > 0 or that 


lo) lo e) 
> ( Jen) <0o 
v=1 


n=l 


or 


or that 


(Saal) <8 


n=1 


(No need to worry about measurability which is automatic in this setup.) 
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As a final example, to justify (2.22) it suffices that the functions {u,} are all 
measurable and that either a,u,(t) is nonnegative for all vy € N and t € R or 


ies ja jw (0) dt < co 
or 


I (f= winiat) <x. 


A precise statement of Fubini’s Theorem requires some Measure Theory that is 
beyond the scope of this book. The reader is referred to (Rudin, 1974, Theorem 
7.8) and (Billingsley, 1995, Chapter 3, Section 18) for such a statement and for a 
proof. 


We shall frequently use the swapping-of-order argument to manipulate the square 
of a sum or the square of an integral. 


Proposition 2.6.1. 


(i) If 0, |av| < co then 
(Soa) =S Yd aay. ix 


(ii) If u is an integrable real-valued or complex-valued signal, then 


(a ta) =f f wajua’dade’. 2.24) 


Proof. The proof is a direct application of Fubini’s Theorem. But ignoring the 
technicalities, the intuition is quite clear: it all boils down to the fact that (a+ 6)? 
can be written as (a+b)(a+b), which can in turn be written as aa+ab+ba+bb. 


2.7 Additional Reading 


Numerous books cover the basics of Lebesgue integration. Classic examples are 
(Riesz and Sz.-Nagy, 1990), (Rudin, 1974) and (Royden, 1988). These texts also 
cover the notion of sets of Lebesgue measure zero, e.g., (Riesz and Sz.-Nagy, 
1990, Chapter 1, Section 2). For the changing of order of Riemann integration 
see (Korner, 1988, Chapters 47 & 48). 


2.8 Exercises 


Exercise 2.1 (Integrating an Exponential). Show that 


/ edt = » Re(z) > 0. 
‘ z 
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Exercise 2.2 (Triangle Inequality for Complex Numbers). Prove the Triangle Inequality 
for complex numbers (2.11). Under what conditions does it hold with equality? 


Exercise 2.3 (When Are Complex Numbers Equal?). Prove that if the complex numbers 
w and z are such that Re(3z) = Re(Gw) for all 6 € C, then w = z. 


Exercise 2.4 (An Integral Inequality). Show that if u, v, and w are integrable signals, 
then 


Co 


Jf \ut)- wears fo |u(t)—vpjace | |v(t) — w(t)| dt. 


—oo 


Exercise 2.5 (An Integral to Note). Given some f € R, compute the integral 


/ It = 17}e"?*F* at. 


—oo 


Exercise 2.6 (Subsets of Sets of Lebesgue Measure Zero). Show that a subset of a set 
of Lebesgue measure zero must also be of Lebesgue measure zero. 


Exercise 2.7 (Nonuniqueness of the Probability Density Function). We say that the 
random variable X is of density fx(-) if fx(-) is a (Lebesgue measurable) nonnegative 
function such that 


PrX<a]= f° fx@ds, ver 


Show that if X is of density fx(-) and if g(-) is a nonnegative function that is indistin- 
guishable from fx(-), then X is also of density g(-). (The reverse is also true: if X is of 
density gi(-) and also of density go(-), then gi(-) and g2(-) must be indistinguishable.) 


Exercise 2.8 (Indistinguishability). Let 7: R? > R satisfy u(a, 3) > 0, for alla,@ ER 
with equality only if a = @. Let u and v be Lebesgue measurable signals. Show that 


(f- (u(t), v(t) dt = 0) => (v = uw). 


—oo 


Exercise 2.9 (Indistinguishable Signals). Show that if the Lebesgue measurable signals g 
and h are indistinguishable, then the set of epochs t € R where the sums })°" _., g(t + J) 


and )7;~_.. h(t + 9) are different (in the sense that they both converge but to different 
limits or that one converges but the other does not) is of Lebesgue measure zero. 


Exercise 2.10 (Continuous Nonnegative Functions). A subset of R containing a nonempty 
open interval cannot be of Lebesgue measure zero. Use this fact to show that if a con- 
tinuous function g: R — R is nonnegative except perhaps on a set of Lebesgue measure 
zero, then the exception set is empty and the function is nonnegative. 


Exercise 2.11 (Order of Summation Sometimes Matters). For every v,7 € N define 


2—2°” ifv=n 
Qun =) —-24+2°" ifvy=nt+1 


0 otherwise. 


Show that (2.21) is not satisfied. See (Royden, 1988, Chapter 12, Section 4, Exercise 24.). 
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Exercise 2.12 (Using Fubini’s Theorem). Using the relation 


=| edt, «>0 
v 0 


and Fubini’s Theorem, show that 


lim i) Se de = 
A080. JQ v 

See (Rudin, 1974, Chapter 7, Exercise 12). 

Hint: See also Problem 2.1. 


NIA 


Chapter 3 


The Inner Product 


3.1 The Inner Product 


The inner product is central to Digital Communications, so it is best to introduce 
it early. The motivation will have to wait. 


Recall that u: A — B indicates that u (sometimes denoted u(-)) is a function 
(or mapping) that maps each element in its domain A to an element in its 
range 6. If both the domain and the range of u are the set of real numbers R, 
then we sometimes refer to u as being a real signal, especially if the argument of 
u(-) stands for time. Similarly, if u: R — C where C denotes the set of complex 
numbers and the argument of u(-) stands for time, then we sometimes refer to u 
as a complex signal. 


The inner product between two real functions u: R — R and v: R — R is 
denoted by (u,v) and is defined as 


(u,v) 4 Le u(t)u(t) dt, (3.1) 


whenever the integral is defined. (In Section 3.2 we shall study conditions un- 
der which the integral is defined, i.e., conditions on the functions u and v that 
guarantee that the product function t > u(t)v(t) is an integrable function.) 


The signals that arise in our study of Digital Communications often represent 
electric fields or voltages over resistors. The energy required to generate them is 
thus proportional to the integral of their squared magnitude. This motivates us to 
define the energy of a Lebesgue measurable real-valued function u: R — R as 


i u(t) dt. 


(If this integral is not finite, then we say that u is of infinite energy.) We say that 
u: R— R is of finite energy if it is Lebesgue measurable and if 


oe u*(t) dt < oo. 


—oco 
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The class of all finite-energy real-valued functions u: R — R is denoted by Lo. 


Since the energy of u: R — R is nonnegative, we can discuss its nonnegative square 
root, which we denote’ by |lul],: 


lula 2 yf fea (3.2) 


—oco 


(Throughout this book we denote by V/€ the nonnegative square root of € for every 
€ > 0.) We can now express the energy in u using the inner product as 


Iujg= [wear 


—Co 


= (u,u). (3.3) 


In writing Ilullz above we used different fonts for the subscript and the superscript. 
The subscript is just a graphical character which is part of the notation ||-||,. We 
could have replaced it with @ and designated the energy by ||u||, without any 
change in mathematical meaning.? The superscript, however, indicates that the 
quantity ||ul|, is being squared. 


For complex-valued functions u: R — C and v: R — C we define the inner product 
(u,v) by 


mv) 4 fuera (3.4) 


whenever the integral is defined. Here v*(t) denotes the complex conjugate of u(t). 
The above integral in (3.4) is a complex integral, but that should not worry you: 
it can also be written as 


CO 


(ay) = fo Re(u(t)ot(o) ae+i f Im(u(Z) v*(t)) dt, (3.5) 


—co —oCo 


where i = /—1 and where Re(-) and Im(-) denote the functions that map a complex 
number to its real and imaginary parts: Re(a+ib) = a and Im(a+ib) = 6 whenever 
a,b € R. Each of the two integrals appearing in (3.5) is the integral of a real signal. 
See Section 2.3. 


Note that (3.1) and (3.4) are in agreement in the sense that if u and v happen 
to take on only real values (i.e., satisfy that u(t), v(t) € R for every t € R), then 
viewing them as real functions and thus using (3.1) would yield the same inner 
product as viewing them as (degenerate) complex functions and using (3.4). Note 
also that for complex functions u,v: R — C the inner product (u, v) is in general 
not the same as (v,u). One is the complex conjugate of the other. 


'The subscript 2 is here to distinguish ||ul|, from ||ul|,, where the latter was defined in (2.6) 
as |lull, = J7, lu(t)| dé. 

2We prefer ||-|| to ||-||¢ because it reminds us that in the definition (3.2) the integrand is 
raised to the second power. This should be contrasted with the symbol ||-||, where the integrand 
is raised to the first power (and where no square root is taken of the result); see (2.6). 
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Some of the properties of the inner product between complex-valued functions 
u,v: R — C are given below. 


(u,v) = (v,u) (3.6) 
(au,v) =a(u,v), aeEC (3.7) 
(u,av) =a*(u,v), aeEC (3.8) 

(uy + U2, Vv) = (uy, Vv) + (ue, v) (3.9) 
(u, vi + V2) = (u,vi) + (u, v2). (3.10) 


The above equalities hold whenever the inner products appearing on the right- 
hand side (RHS) are defined. The reader is encouraged to produce a similar list of 
properties for the inner product between real-valued functions u,v: R — R. 


The energy in a Lebesgue measurable complex-valued function u: R — C is de- 


fined as 
/ |u(t)|? ae, 


where |-| denotes absolute value so |a + ib] = Va? + b? whenever a,b € R. This 
definition of energy might seem a bit contrived because there is no such thing 
as complex voltage, so prima facie it seems meaningless to define the energy of 
a complex signal. But this is not the case. Complex signals are used to repre- 
sent real passband signals, and the representation is such that the energy in the 
real passband signal is proportional to the integral of the squared modulus of the 
complex-valued signal representing it; see Section 7.6 ahead. 


Definition 3.1.1 (Energy-Limited Signal). We say that u: R — C is energy- 
limited or of finite energy if u is Lebesgue measurable and 


a |u(t)|? at < oo. 


The set of all energy-limited complex-valued functions u: R — C is denoted by Lo. 
Note that whether £2 stands for the class of energy-limited complex-valued or real- 
valued functions should be clear from the context, or else immaterial. 


For every u € £2 we define |u|], as the nonnegative square root of its energy 


lull, = V(u,u), (3.11) 


sO 


lula = yf fue, (3.12) 


—co 


Again (3.12) and (3.2) are in agreement in the sense that for every u: R — R, 
computing ||ul|, via (3.2) yields the same result as if we viewed u as mapping 
from R to C and computed |u|], via (3.12). 


3.2 When Is the Inner Product Defined? 17 


3.2 When Is the Inner Product Defined? 


As noted in Section 2.2, in this book we shall only discuss the integral of integrable 
functions, where a function u: R — R is integrable if it is Lebesgue measurable 
and if [°° |u(t)|dt < oo. (We shall sometimes make an exception for functions 
that take on only nonnegative values. If u: R — [0,0o) is Lebesgue measurable 
and if { u(t) dt is not finite, then we shall say that [ u(t) dt = +00.) 

Similarly, as in Section 2.3, in integrating complex signals u: R — C we limit 
ourselves to signals that are integrable in the sense that both t > Re (u(t) and 
t+ Im(u(t)) are Lebesgue measurable real-valued signals and [°° |u(t)| dt < oo. 
Consequently, we shall say that the inner product between u: R- Candv: RC 


is well-defined only when they are both Lebesgue measurable (thus implying that 
t +> u(t) v*(t) is Lebesgue measurable) and when 


ie | u(t) v(t)| dt < oo. (3.13) 


We next discuss conditions on the Lebesgue measurable complex signals u and v 
that guarantee that (3.13) holds. The simplest case is when one of the functions, 
say u, is bounded and the other, say v, is integrable. Indeed, if o., € R is such 
that |u(t)| < oo for all t € R, then |u(t) v(t)| < o.|u(t)| and 


a |u(t) v(t)| dt < oo ie |u(t)| dt = 0 IIvll, 5 


where the RHS is finite by our assumption that v is integrable. 


Another case where the inner product is well-defined is when both u and v are of 
finite energy. To prove that in this case too the mapping t +> u(t) u(t) is integrable 
we need the inequality 


1 
aB< slo +B"), a BER, (3.14) 
which follows directly from the inequality (a — 3)? > 0 by simple algebra: 
Cap) 
=a’? +6? —2a8. 


By substituting |u(t)| for a and |v(t)| for @ in (3.14) we obtain the inequality 
|u(t) v(t)| < (\u(t)|? + |v(t)|?)/2 and hence 


[wo venjats sf _luoParss float, (3.15) 


thus demonstrating that if both u and v are of finite energy (so the RHS is finite), 
then the inner product is well-defined, i.e., t +> u(t)v(t) is integrable. 


As a by-product of this proof we can obtain an upper bound on the magnitude of 
the inner product in terms of the energies of u and v. All we need is the inequality 


[seas] s [roles 
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(see Proposition 2.4.1) to conclude from (3.15) that 


| 
Nl rR 
ee 
fo. 
xs 
1 
Q 
a 
+ 
Nl eRe 
— 
Fat 
= 
ao 
Q 
+ 


1 
= 5 (\lull3 + lvl). (3.16) 


This inequality will be improved in Theorem 3.3.1, which introduces the Cauchy- 
Schwarz Inequality. 


We finally mention here, without proof, a third case where the inner product 
between the Lebesgue measurable signals u, v is defined. The result here is that if 
for some numbers 1 < p,q < oo satisfying 1/p+ 1/q = 1 we have that 


2 |u(t)|? dt < 00 and i. |u(t)|? dt < 00, 


then t + u(t) v(t) is integrable. The proof of this result follows from Hélder’s 
Inequality; see Theorem 3.3.2. Notice that the second case we addressed (where u 
and v are both of finite energy) follows from this case by considering p = q = 2. 


3.3. The Cauchy-Schwarz Inequality 


The Cauchy-Schwarz Inequality is probably the most important inequality on the 
inner product. Its discrete version is attributed to Augustin-Louis Cauchy (1789- 
1857) and its integral form to Victor Yacovlevich Bunyakovsky (1804-1889) who 
studied with him in Paris. Its (double) integral form was derived independently by 
Hermann Amandus Schwarz (1843-1921). See (Steele, 2004, pp. 10-12) for more 
on the history of this inequality and on how inequalities get their names. 


Theorem 3.3.1 (Cauchy-Schwarz Inequality). If the functions u,v: R — C are 
of finite energy, then the mapping t > u(t) v*(t) is integrable and 


|(u,v)] < [lull Ilva - (3.17) 


That is, 


feo a < Viton i [iota 


Equality in the Cauchy-Schwarz Inequality is possible, e.g., if u is a scaled version 
of v, i.e., if for some constant a 


u(t) =av(t), teER. 
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In fact, the Cauchy-Schwarz Inequality holds with equality if, and only if, either v(t) 
is zero for all t outside a set of Lebesgue measure zero or for some constant a we 
have u(t) = av(t) for all t outside a set of Lebesgue measure zero. 


There are a number of different proofs of this important inequality. We shall focus 
here on one that is based on (3.16) because it demonstrates a general technique for 
improving inequalities. The idea is that once one obtains a certain inequality—in 
our case (3.16)—one can try to improve it by taking advantage of one’s under- 
standing of how the quantity in question is affected by various transformations. 
This technique is beautifully illustrated in (Steele, 2004). 


Proof. The quantity in question is |(u,v)|. We shall take advantage of our under- 
standing of how this quantity behaves when we replace u with its scaled version 
au and when we replace v with its scaled version Gv. Here a, € C are arbitrary. 
The quantity in question transforms as 


(au, Bv)| = |a| |G] |(u, v)].- (3.18) 


We now use (3.16) to upper-bound the left-hand side (LHS) of the above by sub- 
stituting au and Gv for u and v in (3.16) to obtain 


lal [31 (u,¥)] = Kou, 6v)| 
Jule + SlOP vis, aBec. (8.19) 


If both ||u||, and ||v||, are positive, then (3.17) follows from (3.19) by choosing 
a=1/|\ul|, and 6 = 1/||v||,. To conclude the proof it thus remains to show that 
(3.17) also holds when either ||ul|, or ||v||, is zero so the RHS of (3.17) is zero. 
That is, we need to show that if either ||ul|, or ||v||, is zero, then (u,v) must also 
be zero. To show this, suppose first that ||ul|, is zero. By substituting a = 1 in 
(3.19) we obtain in this case that 


1 
91 |(u,v)] < 516) llvils 


which, upon dividing by |G], yields 


1 
(u,v) < 514 Iv, BAO. 


Upon letting |G| tend to zero from above this demonstrates that (u,v) must be zero 
as we set out to prove. (As an alternative proof of this case one notes that |/ul|, = 0 
implies, by Proposition 2.5.3, that the set {t € R : u(t) 4 0} is of Lebesgue measure 
zero. Consequently, since every zero of t + w(t) is also a zero of t + u(t) vu*(d), 
it follows that {¢ € R: u(t)v*(t) F O} is included in {t € R: u(t) F O}, and 
must therefore also be of Lebesgue measure zero (Exercise 2.6). Consequently, by 
Proposition 2.5.3, hoes |u(t) v*(t)|dt must be zero, which, by Proposition 2.4.1, 
implies that |(u,v)| must be zero.) 


The case where ||v||, = 0 is very similar: by substituting G = 1 in (3.19) we obtain 
that (in this case) 


1 
(u,v)| < Slalllulls, a #0 
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and the result follows upon letting |a| tend to zero from above. 


While we shall not use the following inequality in this book, it is sufficiently im- 
portant that we mention it in passing. 


Theorem 3.3.2 (H6lder’s Inequality). Jf u: R — C andv: R= C are Lebesgue 
measurable functions satisfying 


ie lu(é)|?dt < 00 and a lu(t)|? dt < 00 


for some 1 < p,q < © satisfying 1/p+1/q =1, then the function t > u(t) v*(t) is 
integrable and 


Wee u(t) v*(t) a 5 (jul ar) = (fatter). (3.20) 


Note that the Cauchy-Schwarz Inequality corresponds to the case where p = q = 2. 


Proof. See, for example, (Rudin, 1974, Theorem 3.5) or (Royden, 1988, Section 
6.2). 


3.4 Applications 


There are numerous applications of the Cauchy-Schwarz Inequality. Here we only 
mention a few. The first relates the energy in the superposition of two signals to 
the energies of the individual signals. The result holds for both complex-valued and 
real-valued functions, and—as is our custom—we shall thus not make the range 
explicit. 


Proposition 3.4.1 (Triangle Inequality for £2). [fu and v are in Le, then 
ut vilo < lulls + [Ilvlle- (3.21) 


Proof. The proof is a straightforward application of the Cauchy-Schwarz Inequality 
and the basic properties of the inner product (3.6)—(3.9): 


jut v|/, = (u+v,u-+ v) 
= (u,u) + (v,v) + (u,v) + (v, u) 
< (u,u) + (v,v) + |(u, v)| +|(v, u)| 
= |lulls + lIvlls + 2I(u, v)| 
< llulls + lvls +2 [ull Ilvile 


= (lulls + IIvlle)’, 


from which the result follows by taking square roots. Here the first line follows 
from the definition of ||-||, (3.11); the second by (3.9) & (3.10); the third by the 
Triangle Inequality for Complex Numbers (2.12); the fourth because, by (3.6), 
(v,u) is the complex conjugate of (u,v) and is hence of equal modulus; the fifth 
by the Cauchy-Schwarz Inequality; and the sixth by simple algebra. 
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Another important mathematical consequence of the Cauchy-Schwarz Inequality is 
the continuity of the inner product. To state the result we use the notation an — a 
to indicate that the sequence aj, a2,... converges to a, i.e., that limy_..5 dn = a. 


Proposition 3.4.2 (Continuity of the Inner Product). Let u and v be in Lo. If 
the sequence U1, U2,... of elements of Lo satisfies 


Jun — ull, — 0, 
and if the sequence V1,V2,... of elements of Lo satisfies 
[Vn — Vly — 9, 


then 


Proof. 


|(Un, Vn) — (u, v)| 
= |(u, — u,v) + (u, — u,v, — v) + (u,v, — v)| 
|(un — u,v)| + |(Un — U, Vn — V)| + |(U, Vn — V)| 
[Un — Ully [lvlz + [lun — ulle [|vn — VIlp + llulla Ilvn — vile 
= 0, 


IN IA 


where the first equality follows from the basic properties of the inner product (3.6)— 
(3.10); the subsequent inequality by the Triangle Inequality for Complex Numbers 
(2.12); the subsequent inequality from the Cauchy-Schwarz Inequality; and where 
the final limit follows from the proposition’s hypotheses. 


Another useful consequence of the Cauchy-Schwarz Inequality is in demonstrating 
that if a signal is energy-limited and is zero outside an interval, then it is also 
integrable. 


Proposition 3.4.3 (Finite-Energy Functions over Finite Intervals are Integrable). 
If for some real numbers a and b satisfying a < b we have 


b 
/ le(€)| dé < 00, 


b b > 
[ |s@laes vo=a [ lol dé, 


b 
/ |a(€)] dé < on. 


then 


and, in particular, 
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Proof. 


b Co 
f le@la= f Ia < é <b} |o(€)] ae 


=] ee 
7 u(€) 


<vb=ay [ Pete )/ dé, 


where the inequality is just an application of the Cauchy-Schwarz Inequality to the 
function € + I{a < € < b}|a(€)| and the indicator function € + I{a < € < b}. 


Note that, in general, an energy-limited signal need not be integrable. For example, 


the real signal 
ift<1 
fig ee (3.22) 
1/t otherwise, 


is of finite energy but is not integrable. 


The Cauchy-Schwarz Inequality demonstrates that if both u and v are of finite 
energy, then their inner product (u, v) is well-defined, i-e., the integrand in (3.4) is 
integrable. It can also be used in slightly more sophisticated ways. For example, it 
can be used to treat cases where one of the functions, say u, is not of finite energy 
but where the second function decays to zero sufficiently quickly to compensate for 
that. For example: 


Proposition 3.4.4. If the Lebesgue measurable functions x: R—- C andy: RC 
satisfy 
© |e)? 
dt 
ia P+ rae 
and 


a |y(t)|? (t? + 1) dt < 09, 


—oco 


then the function t > a(t) y*(t) is integrable and 


[fi eove t)dt| < </f- OT any) J (t)|? (t2 +1) dt. 


Proof. This is a simple application of the Cauchy-Schwarz Inequality to the func- 


tions tr a2(t)/Vt? +1 andt + y(t)Vt? +1. Simply write 
x(t) y*(t) dt = t2 4+ 1Ly*(t) dt 
[ sorwu=f Se verire 
>_> 


v*(t) 


u(t) 


and apply the Cauchy-Schwarz Inequality to the functions u(-) and v(-). 
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3.5 The Cauchy-Schwarz Inequality for Random Variables 


There is also a version of the Cauchy-Schwarz Inequality for random variables. It is 
very similar to Theorem 3.3.1 but with time integrals replaced by expectations. We 
denote the expectation of the random variable X by E[X] and remind the reader 
that the variance Var[X] of the random variable X is defined by 


Var.X] = E[(X — E[X])*]. (3.23) 


Theorem 3.5.1 (Cauchy-Schwarz Inequality for Random Variables). Let the ran- 
dom variables U and V be of finite variance. Then 


|E[UV]| < VE[U?] /E[V?], (3.24) 


with equality if, and only if, PrlaU = BV] =1 for some real a and 3 that are not 
both equal to zero. 


Proof. Use the proof of Theorem 3.3.1 with all time integrals replaced with ex- 
pectations. For a different proof and for the conditions for equality see (Grimmett 
and Stirzaker, 2001, Chapter 3, Section 3.5, Theorem 9). 


For the next corollary we need to recall that the covariance Cov[U, V] between the 
finite-variance random variables U, V is defined by 


Cov[U, V] = E[(U — E[U])(V — E[V])]. (3.25) 


Corollary 3.5.2 (Covariance Inequality). If the random variables U and V are of 
finite variance Var[U] and Var|[V], then 


|Cov[U, V]| < V/Var[U] V/Var[V]. (3.26) 


Proof. Apply Theorem 3.5.1 to the random variables U — E[U] and V — E[V]. 


Corollary 3.5.2 shows that the correlation coefficient, which is defined for ran- 
dom variables U and V having strictly positive variances as 


2 Cov[U, V] (3.27) 
= JVar[U]\/Var[V] , 
satisfies 
-l<p<41. (3.28) 


3.6 Mathematical Comments 


(i) Mathematicians typically consider (u,v) only when both u and v are of finite 
energy. We are more forgiving and simply require that the integral defining 
the inner product be well-defined, i.e., that the integrand be integrable. 
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(ii) Some refer to ||ul|, as the “norm of u” or the “£2 norm of u.” We shall 
refrain from this usage because mathematicians use the term “norm” very 
selectively. They require that no function other than the all-zero function be 
of zero norm, and this is not the case for |]-||,. Indeed, any function u that is 
indistinguishable from the all-zero function satisfies ||u||, = 0, and there are 
many such functions (e.g., the function that is equal to one at rational times 
and that is equal to zero at all other times). This difficulty can be overcome 
by defining two functions to be the same if their difference is of zero energy. 
In this case ||-||, is a norm in the mathematical sense and is, in fact, what 
mathematicians call the Lg norm. This issue is discussed in greater detail in 
Section 4.7. To stay out of trouble we shall refrain from giving ||-||) a name. 


3.7 Exercises 


Exercise 3.1 (Manipulating Inner Products). Show that if u, v, and w are energy-limited 
complex signals, then 


(u+v,3u+v+iw) =3|lu 2 + IIvil3 + (u,v) +3 (u, v)* —i(u,w) —i(v,w). 


Exercise 3.2 (Orthogonality to All Signals). Let u be an energy-limited signal. Show 
that 


( =0) ° ((u,v) =0, ve Le). 


Exercise 3.3 (Finite-Energy Signals). Let x be an energy-limited signal. 


(i) Show that, for every to € R, the signal t + x(t — to) must also be energy-limited. 


(ii) Show that the reflection of x is also energy-limited. I.e., show that the signal x 
that maps t to x(—t) is energy-limited. 


(iii) How are the energies in t+> x(t), tr a(t — to), and tr x(-t) related? 


Exercise 3.4 (Inner Products of Mirror Images). Express the inner product (x,¥) in 
terms of the inner product (x, y). 


Exercise 3.5 (On the Cauchy-Schwarz Inequality). Show that the bound obtained from 
the Cauchy-Schwarz Inequality is at least as tight as (3.16). 


Exercise 3.6 (Truncated Polynomials). Consider the signals u: t > (t+ 2)1{0 <t < 1} 
and v: t+ (t? — 2t—3)1{0 < t < 1}. Compute the energies eles & ||v||3 and the inner 
product (u,v). 


Exercise 3.7 (Indistinguishability and Inner Products). Let u € Le be indistinguishable 
from u’ € Le, and let v € Le be indistinguishable from v’ € Le. Show that the inner 
product (u’,v’) is equal to the inner product (u, v). 
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Exercise 3.8 (Finite Energy and Integrability). Let x: R — C be Lebesgue measurable. 


(i) Show that the conditions that x is of finite energy and that the mapping t + t z(t) 
is of finite energy are simultaneously met if, and only if, 


i |a(t)|? (1 + t?) dt < 00. (3.29) 


—oo 


(ii) Show that (3.29) implies that x is integrable. 


(iii) Give an example of an integrable signal that does not satisfy (3.29). 


Exercise 3.9 (The Cauchy-Schwarz Inequality for Sequences). 


(i) Let the complex sequences ai, a@2,... and bj, b2,... satisfy 
J Javl?, So Ibu? < 00. 
v=1 v=1 
Show that 


co 2 co foe) 
So avbs| < Gy ial?) es wr), 
v=1 v=1 v=1 


(ii) Derive the Cauchy-Schwarz Inequality for d-tuples: 


d 2 d d 
ba ee 6s ial") és er). 
v=1 v=1 v=1 


Exercise 3.10 (Summability and Square Summability). Let ai,a2,... be a sequence of 
complex numbers. Show that 


(ola <0) => (Sle? <0). 


v=1 v=1 


Exercise 3.11 (A Friendlier GPA). Use the Cauchy-Schwarz Inequality for d-tuples (Prob- 
lem 3.9) to show that for any positive integer d, 


eae 2 * he 2 
F< fa, ai,...,aa4ER. 


Chapter 4 


The Space L» of Energy-Limited Signals 


4.1 Introduction 


In this chapter we shall study the space Lg of energy-limited signals in greater 
detail. We shall show that its elements can be viewed as vectors in a vector space 
and begin developing a geometric intuition for understanding its structure. We 
shall focus on the case of complex-valued signals, but with some minor changes the 
results are also applicable to real-valued signals. (The main changes that are needed 
for translating the results to real-valued signals are replacing C with R, ignoring 
the conjugation operation, and interpreting |-| as the absolute value function for 
real arguments as opposed to the modulus function.) 


We remind the reader that the space £2 was defined in Definition 3.1.1 as the set 
of all Lebesgue measurable complex-valued signals u: R — C satisfying 


a |u(t)|? dt < 00, (4.1) 


and that in (3.12) we defined for every u € £2 the quantity ||ul|, as 


llul|, = if |u(t)|? at. (4.2) 


We refer to £2 as the space of energy-limited signals and to its elements as energy- 
limited signals or signals of finite energy. 


4.2. L»5 as a Vector Space 


In this section we shall explain how to view the space £» as a vector space over 
the complex field by thinking about signals in Ly as vectors, by interpreting the 
superposition u + v of two signals as vector-addition, and by interpreting the 
amplification of u by a as the operation of multiplying the vector u by the scalar 
ae€eC. 


We begin by reminding the reader that the superposition of the two signals u 
and v is denoted by u+ v and is the signal that maps every t € R to u(t) + v(t). 


26 


4.2 Lo as a Vector Space 27 


The amplification of u by a is denoted by au and is the signal that maps every 
t € R to au(t). More generally, if u and v are signals and if a and @ are complex 
numbers, then au + (Gv is the signal t + au(t) + Bou(t). 


Ifu € Ly anda €C, then au is also in £yg. Indeed, the measurability of u implies 
the measurability of au, and if u is of finite energy, then au is also of finite energy, 
because the energy in au is the product of |a|? by the energy in u. We thus see 
that the operation of amplification of u by a results in an element of £2 whenever 
u€ Ly anda€eC. 


We next show that if the signals u and v are in Lg, then their superposition 
u+v must also be in £2. This holds because a standard result in Measure Theory 
guarantees that the superposition of two Lebesgue measurable signals is a Lebesgue 
measurable signal and because Proposition 3.4.1 guarantees that if both u and v 
are of finite energy, then so is their superposition. Thus the superposition that 
maps u and v to u+ v results in an element of £2 whenever u,v € Lo. 


It can be readily verified that the following properties hold: 


(i) commutativity: 
u+ve=v+u, uve Lo; 


(ii) associativity: 
(ut+v)+w=u+(v+w), uv,we Lo, 
(aZ)u = a(Gu), (0,8 eC, ue Ls); 
(iii) additive identity: the all-zero signal 0: t + 0 satisfies 
O+u=u, ue £o; 


(iv) additive inverse: to every u € Lg there corresponds a signal w € Lo 
(namely, the signal t + —u(t)) such that 


u+w=0O; 
(v) multiplicative identity: 


lu=u, ucéLo; 
(vi) distributive properties: 


a(u+v) =au+av, (aeC, u,v € Le), 


(a+ B)u=au+ fu, (a,8€C, we Le). 


We conclude that with the operations of superposition and amplification the set £2 
forms a vector space over the complex field (Axler, 1997, Chapter 1). This justifies 
referring to the elements of £» as “vectors,” to the operation of signal superposition 
as “vector addition,” and to the operation of amplification of an element of Lo by 
a complex scalar as “scalar multiplication.” 
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4.3 Subspace, Dimension, and Basis 


Once we have noted that £2 together with the operations of superposition and 
amplification forms a vector space, we can borrow numerous definitions and results 
from the theory of vector spaces. Here we shall focus on the very basic ones. 


A linear subspace (or just subspace) of £2 is a nonempty subset U of Le that 
is closed under superposition 


uj, t+u€u, uy,u.€cu (4.3) 
and under amplification 
aueu, (a€C, uel). (4.4) 
Example 4.3.1. Consider the set of all functions of the form 
tr p(t)e "4, 


where p(t) is any polynomial of degree no larger than 3. Thus, the set is the set of 
all functions of the form 


th (a9 + ayt + at? + a3t*) rag (4.5) 


where ao, Q1, @2,@3 are arbitrary complex numbers. 


In spite of the polynomial growth of the pre-exponent, all such functions are in £2 
because the exponential decay more than compensates for the polynomial growth. 
The above set is thus a subset of £2. Moreover, as we show next, this is a linear 
subspace of Lo. 


If u is of the form (4.5), then so is au, because au is the mapping 
tho (aa + aa;t + aagt? + aast®) eWl4l, 


which is of the same form. 


Similarly, if u is as given in (4.5) and 
vi tH (Go + Ait + Bot? + Bst?) eM, 


then u+ v is the mapping 


t+ ((ao + Bo) + (a1 + B1)t + (a2 + B2)t? + (ag + B3)t3) em", 
which is again of this form. 


An n-tuple of vectors from Ly is a (possibly empty) ordered list of n vectors 
from £2 separated by commas and enclosed in parentheses, e.g., (V1,...,WVn). Here 
n > 0 can be any nonnegative integer, where the case n = 0 corresponds to the 
empty list. 
A vector v € £g is said to be a linear combination of the n-tuple (v1,..., vn) if 
it is equal to 

Q1Vi tess + AnVn, (4.6) 
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which is written more succinctly as 


= QyVp, (4.7) 
v=1 


for some scalars Q1,...,Q@, € C. The all-zero signal is a linear combination of any 
n-tuple including the empty tuple. 


The span of an n-tuple (vi,...,Vn) of vectors in £2 is denoted by 
span(vi,.--,Vn) 
and is the set of all vectors in £2 that are linear combinations of (v1,...,Vn): 
span(v1,...,Vn) = {azvi +++» + QnVn ! O1,-..,An € Ch. (4.8) 


(The span of the empty tuple is given by the one-element set {0} containing the 
all-zero signal only.) 


Note that for any n-tuple of vectors (v1,...,Vn) in £2 we have that span(v1,...,Vn) 
is a linear subspace of Lg. Also, if YU is a linear subspace of £2 and if the vectors 
U1,...,U, are in U, then span(u,,...,u,,) is a linear subspace which is contained 
in UY. A subspace U of Lo is said to be finite-dimensional if there exists an 
n-tuple (ui,...,Un) of vectors in YU such that span(ui,...,u,) = U. Otherwise, 
we say that U is infinite-dimensional. For example, the space of all mappings 
of the form t ++ p(t)e~"! for some polynomial p(-) can be shown to be infinite- 
dimensional, but under the restriction that p(-) be of degree smaller than 5, it is 
finite-dimensional. If U/ is a finite-dimensional subspace and if U/’ is a subspace 
contained in U, then U/’ must also be finite-dimensional. 


An n-tuple of signals (v1,...,Vn) in Lg is said to be linearly independent if 
whenever the scalars a1,...,@n, € C are such that ayv, +---QnVn = 0, we have 
Qa, =-:-=a, = 0. Le., if 
n 
(Sra, 0) (a 0, v Ena (4.9) 
v=1 


(By convention, the empty tuple is linearly independent.) For example, the 3- 
tuple consisting of the signals tH e7!4, tH tel, and t 6 ft? e7!4 is linearly 
independent. If (vi1,...,vn) is not linearly independent, then we say that it is 
linearly dependent. For example, the 3-tuple consisting of the signals t + e~!4!, 
tr te ll, and t > (2¢+ 1) e7!" is linearly dependent. The n-tuple (v1,...,Vn) 
is linearly dependent if, and only if, (at least) one of the signals in the tuple can 
be written as a linear combination of the others. 


The d-tuple (uj,...,Uq) is said to form a basis for the linear subspace U if it is 
linearly independent and if span(uj,...,uq) =U. The latter condition is equivalent 
to the requirement that every u € U can be represented as 


u=ayu, +-+:+agug (4.10) 


for some aj,...,@q@ € C. The former condition that the tuple (u1,...,ug) be 
linearly independent guarantees that if such a representation exists, then it is 


30 The Space £2 of Energy-Limited Signals 


unique. Thus, (u1,...,Uq) forms a basis for YU if uj,...,ua € U (thus guaranteeing 
that span(u1,...,Uug) C UY) and if every u € U can be written uniquely as in (4.10). 
Every finite-dimensional linear subspace U/ has a basis, and all bases for U/ have the 
same number of elements. This number is called the dimension of U/. Thus, if U 
is a finite-dimensional subspace and if both (u,,...,uq) and (uj,...,u/,) form a 
basis for U, then d = d’ and both are equal to the dimension of U/. The dimension 
of the subspace {0} is zero. 


4.4 |lul|, as the “length” of the Signal u(-) 


Having presented the elements of £2 as vectors, we next propose to view ||ul|, as 
the “length” of the vector u € Ly. To motivate this view, we first present the key 
properties of ||-|| 5. 


Proposition 4.4.1 (Properties of ||-||,). Let u and v be elements of L2, and let a 
be some complex number. Then 


lloull, =a lull, , (4.11) 


Ju+-vll2 <llulle + Ive, (4.12) 


and 


(lull, =0) + (u=o). (4.13) 


Proof. Identity (4.11) follows directly from the definition of ||-||,; see (4.2). In- 
equality (4.12) is a restatement of Proposition 3.4.1. The equivalence of the con- 
dition ||u||, = 0 and the condition that u is indistinguishable from the all-zero 
signal O follows from Proposition 2.5.3. 


Identity (4.11) is in agreement with our intuition that stretching a vector merely 
scales its length. Inequality (4.12) is sometimes called the Triangle Inequality 
because it is reminiscent of the theorem from planar geometry that states that the 
length of no side of a triangle can exceed the sum of the lengths of the others; see 
Figure 4.1. 


Substituting —y for u and x + y for v in (4.12) yields ||x||, < |lyllp + Ik +yllo, 
ie., the inequality ||x + y||, > ||x||. —|ly||,- And substituting —x for uandx+y 
for v in (4.12) yields the inequality |ly||, < ||x||, + |k+yll», ie., the inequality 
|x + y|lo = llyllo — ||x||.. Combining the two inequalities we obtain the inequality 
Ix+yllo > |Ixllo — llyl|.|. This inequality can be combined with the inequality 
|x + yllo < ||xllo +lly||, in the compact form of a double-sided inequality 


llIxlle - Iylle| <x +ylle <Ixlle +live, xy € Le. (4.14) 


Finally, (4.13) “almost” supports the intuition that the only vector of length zero 
is the zero-vector. In our case, alas, we can only claim that if a vector is of zero 
length, then it is indistinguishable from the all-zero signal, i.e., that all t’s outside 
a set of Lebesgue measure zero are mapped by the signal to zero. 
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u+v 


Figure 4.1: A geometric interpretation of the Triangle Inequality for energy-limited 
signals: |[u + v|lp < |lully + Ilvllo- 


Figure 4.2: Illustration of the shortest path property in £2. The shortest path 
from A to B is no longer than the sum of the shortest path from A to C and the 
shortest path from C to B. 


The Triangle Inequality (4.12) can also be stated slightly differently. In planar 
geometry the sum of the lengths of two sides of a triangle can never be smaller 
than the length of the remaining side. Thus, the shortest path from Point A to 
Point B cannot exceed the sum of the lengths of the shortest paths from Point A to 
Point C, and from Point C to Point B. By applying Inequality (4.12) to the signal 
u — w and w — v we obtain 


Ju—vllp <|lu—wll,+llw—vlp, uv,weLo, 


i.e., that the distance from u to v cannot exceed the sum of distances from u to w 
and from w to v. See Figure 4.2. 
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4.5 Orthogonality and Inner Products 


To further develop our geometric view of £2 we next discuss orthogonality. We 
shall motivate its definition with an attempt to generalize Pythagoras’s Theorem 
to £2. As an initial attempt at defining orthogonality we might define two func- 
tions u,v € Le to be orthogonal if ju + v|l3 - Il) + IIv [3 Recalling the 
definition of ||-||, (4.2) we obtain that this condition is equivalent to the condition 
Re(f u(t) v*(t) dt) = 0, because 


uty = lu(t) + v(#)/? at 


—oco 


= is (u(t) + v(t)) (u(t) + v(t)” at 


—oco 


= es (u(t) * + |u(t)|? + 2 Re(u(t) v*(0)) dt 


—co 


= Jul + Iv +2Re( f u(t) 0" (®t). u,v € Lo, (4.15) 


—Co 
where we have used the fact that integration commutes with the operation of taking 
the real part; see Proposition 2.3.1. 


While this approach would work well for real-valued functions, it has some embar- 
rassing consequences when it comes to complex-valued functions. It allows for the 
possibility that u is orthogonal to v, but that its scaled version au is not. For exam- 
ple, with this definition, the function t + il{|t] <5} is orthogonal to the function 
tr I{|t] < 17} but its scaled (by a = i) version t ® ii I{|¢] < 5} = —T{|t| < 5} is 
not. To avoid this embarrassment, we define u to be orthogonal to v if 


2 2 2 
lou + vll2 = lleull, + |lvll2, a€C. 


This, by (4.15), is equivalent to 


—co 


Re (af u(t) v* (t) ar) =0, a€C, 

i.e., to the condition 
‘| u(t) v*(t) dt = 0 (4.16) 
(because if z € C is such that Re(az) = 0 for all a € C, then z = 0). Recalling the 
definition of the inner product (u,v) from (3.4) 

(u,v) = i u(t) v* (t) dt, (4.17) 
we conclude that (4.16) is equivalent to the condition (u,v) = 0 or, equivalently 
(because by (3.6) (u,v) = (v,u)*) to the condition (v,u) = 0. 


Definition 4.5.1 (Orthogonal Signals in £2). The signals u,v € Loe are said to 
be orthogonal if 
(u,v) =0. (4.18) 
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The n-tuple (ui,...,U,) is said to be orthogonal if any two signals in the tuple are 
orthogonal 


(ug, ug) =0, (ze PEL, cy), (4.19) 


The reader is encouraged to verify that if u is orthogonal to v then so is au. Also, 
u is orthogonal to v if, and only if, v is orthogonal to u. Finally every function is 
orthogonal to the all-zero function 0. 


Having judiciously defined orthogonality in £2, we can now extend Pythagoras’s 
Theorem. 


Theorem 4.5.2 (A Pythagorean Theorem). Jf the n-tuple of vectors (u1,...,Un) 
in Lo is orthogonal, then 


lar + bh ualls = urls + + anlls - 


Proof. This theorem can be proved by induction on n. The case n = 2 follows 
from (4.15) using Definition 4.5.1 and (4.17). 


Assume now that the theorem holds for n = v, for some v > 2, i-e., 
2 2 2 
Jui +--+ +uy|l> = |lui|lg +---+ [wll , 
and let us show that this implies that it also holds for n = v + 1, ie., that 
2 2 2 
Jur +--+ + uysilly = |luilly +--+ + [luvsill, - 


To that end, let 


v=u,t+-:::+u,. (4.20) 
Since the v-tuple (u,,...,u,) is orthogonal, our induction hypothesis guarantees 
that 
2 2 2 
lvl = lluillg +--+ + [lulls - (4.21) 


Now v is orthogonal to u,+; because 


(v, uy41) = (uy Speech uy, Uy +1) 


= (uy, Uy+1) apt ee (uy, Up41) 
0, 


I 


so by the n = 2 case 
2 2 2 
lv +l = Uv + vail. (4.22) 
Combining (4.20), (4.21), and (4.22) we obtain 


Jur +--+ uvsalle = [ly + asl 


2 2 
= IIvIle + Iuesille 


2 2 
= |luallg +--+ + [luvsille - 
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Figure 4.3: The projection w of the vector v onto u. 


To derive a geometric interpretation for the inner product (u,v) we next extend 
to Ly the notion of the projection of a vector onto another. We first recall the 
definition for vectors in R?. Consider two nonzero vectors u and v in the real 
plane R?. The projection w of the vector v onto u is a scaled version of u. More 
specifically, it is a scaled version of u and its length is equal to the product of the 
length of v multiplied by the cosine of the angle between v and u (see Figure 4.3). 
More explicitly, 


u 


w = (length of v) cos(angle between v and u) (4.23) 


length of u’ 


This definition does not seem to have a natural extension to Lg because we have not 
defined the angle between two signals. An alternative definition of the projection, 
and one that is more amenable to extensions to La, is the following. The vector w 
is the projection of the vector v onto u, if w is a scaled version of u, and if v —w 
is orthogonal to u. 


This definition makes perfect sense in Ly too, because we have already defined 
what we mean by “scaled version” (i.e., “amplification” or “scalar multiplication” ) 
and “orthogonality.” We thus have: 


Definition 4.5.3 (Projection of a Signal in £2 onto another). Let u € Le have 
positive energy. The projection of the signal v € Le onto the signal u € Le 
is the signal w that satisfies both of the following conditions: 


1) w =au for some a€ C and 


2) v —w is orthogonal to u. 


Note that since Lg is closed with respect to scalar multiplication, Condition 1) 
guarantees that the projection w is in Lo. 


Prima facie it is not clear that a projection always exists and that it is unique. 
Nevertheless, this is the case. We prove this by finding an explicit expression 
for w. We need to find some a € C so that au will satisfy the requirements of 
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the projection. The scalar a is chosen so as to guarantee that v — w is orthogonal 
to u. That is, we seek to solve for a € C satisfying 


(v — au,u) = 0, 


(v,u) — alfull =0. 


Recalling our hypothesis that ||u||, > 0 (strictly), we conclude that a is uniquely 
given by 
_ (v,u) 


~ jull2’ 


and the projection w is thus unique and is given by 


w= haa u. (4.24) 
I|ull, 


Comparing (4.23) and (4.24) we can interpret 


(v, u) 


Tells Ive ey 


as the cosine of the angle between the function v and the function u (provided 
that neither u nor v is zero). If the inner product is zero, then we have said that 
v and u are orthogonal, which is consistent with the cosine of the angle between 
them being zero. Note, however, that this interpretation should be taken with a 
grain of salt because in the complex case the inner product in (4.25) is typically a 
complex number. 


The interpretation of (4.25) as the cosine of the angle between v and u is further 
supported by noting that the magnitude of (4.25) is always in the range [0, 1]. This 
follows directly from the Cauchy-Schwarz Inequality (Theorem 3.3.1) to which we 
next give another (geometric) proof. Let w be the projection of v onto u. Then 
starting from (4.24) 


Kivu)? 2 
lull? pa 
< ||wll3 + lv — wll 
= ||w + (v — w)||3 


=|Ivll3, (4.26) 


where the first equality follows from (4.24); the subsequent inequality from the 
nonnegativity of ||-||,; and the subsequent equality by the Pythagorean Theorem 
because, by its definition, the projection w of v onto u must satisfy that v — w is 
orthogonal to u and hence also to w, which is a scaled version of u. The Cauchy- 
Schwarz Inequality now follows by taking the square root of both sides of (4.26). 
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4.6 Orthonormal Bases 


We next consider orthonormal bases for finite-dimensional linear subspaces. These 
are special bases that are particularly useful for the calculation of projections and 
inner products. 


4.6.1 Definition 


Definition 4.6.1 (Orthonormal Tuple). An n-tuple of signals in Lo is said to be 
orthonormal if it is orthogonal and if each of the signals in the tuple is of unit 
energy. 


Thus, the n-tuple (@1,...,@n) of signals in £2 is orthonormal, if 


0 flee, 


ef €{1,...,n}- 4.27 
1 ife=Z, ees oe 


(oc, ber) = 


Linearly independent tuples need not be orthonormal, but orthonormal tuples must 
be linearly independent: 


Proposition 4.6.2 (Orthonormal Tuples Are Linearly Independent). Jf a tuple of 
signals in Lg is orthonormal, then it must be linearly independent. 


Proof. Let the n-tuple (@1,...,@,) of signals in £2 be orthonormal, i.e., satisfy 
(4.27). We need to show that if 


Sores (4.28) 
(=1 
then all the coefficients a1,...,@, must be zero. To that end, assume (4.28). It 
then follows that for every @’ € {1,...,n} 
0= (0, ov) 
- ss ace, ov) 
(=1 


S) ae (be, be’) 
é=1 


=> So ae I{é = ey 


thus demonstrating that (4.28) implies that ag = 0 for every @’ € {1,...,n}. Here 
the first equality follows because 0 is orthogonal to every energy-limited signal 
and, a fortiori, to de; the second by (4.28); the third by the linearity of the inner 
product in its left argument (3.7) & (3.9); and the fourth by (4.27). 
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Definition 4.6.3 (Orthonormal Basis). A d-tuple of signals in Le is said to form 
an orthonormal basis for the linear subspace U C Loe if it is orthonormal and 
its span is U. 


4.6.2 Representing a Signal Using an Orthonormal Basis 


Suppose that (@1,...,@a) is an orthonormal basis for U4 C Lg. The fact that 
(g1,...,@a) spans U guarantees that every u € YU can be written as u = >, ache 
for some coefficients a1,...,@q € C. The fact that (@1,...,@a) is orthonormal 
implies, by Proposition 4.6.2, that it is also linearly independent and hence that 
the coefficients {ae} are unique. How does one go about finding these coefficients? 
We next show that the orthonormality of (@1,...,@a) also implies a very simple 
expression for ag above. Indeed, as the next proposition demonstrates, ay is given 
explicitly as (u, de). 


Proposition 4.6.4 (Representing a Signal Using an Orthonormal Basis). 


(i) If (@1,..-, ba) ts an orthonormal tuple of functions in Ly and if u € Lo 
5 d 
can be written as u= )-7_, ache for some complex numbers ay,..., aa, then 


ae = (u, de) for every £ € {1,...,d}: 
d 


(u= rarer) > (a= (ude), Ce {1-0}, 


(=1 
((f1,-.-,@a) orthonormal). (4.29) 


(it) If (@1,..-,@a) ts an orthonormal basis for the subspace U C Lo, then 


d 
u= S- (u, de) ge, uel. (4.30) 


Proof. We begin by proving Part (i). If u = ae aede, then for every 0’ € 


{1,...,d} 


(u, pe) = ( s age, ou) 


l=1 


d 
= So ae (ge, bv’) 


f=1 
d 
= onlteS rj 
l=1 
= ae’, 


thus proving Part (i). 
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We next prove Part (ii). Let u € UY be arbitrary. Since, by assumption, the tuple 


(¢1,.-.,@a) forms an orthonormal basis for U it follows a fortiori that its span 
is U and, consequently, that there exist coefficients a1,...,a@q © C such that 
d 
u= > ardy. (4.31) 
t=1 


It now follows from Part (i) that for each @ € {1,...,d} the coefficient ay in (4.31) 
must be equal to (u, dg), thus establishing (4.30). 


This proposition shows that if (@1,...,@a) is an orthonormal basis for the sub- 
space U and if u € U, then u is fully determined by the complex constants (u, @1), 
..., (u,@a). Thus, any calculation involving u can be computed from these con- 
stants by first reconstructing u using the proposition. As we shall see in Proposi- 
tion 4.6.9, calculations involving inner products and norms are, however, simpler 
than that. 


4.6.3 Projection 


We next discuss the projection of a signal v € £2 onto a finite-dimensional linear 
subspace U/ that has an orthonormal basis (¢1,...,@q). To define the projection 
we shall extend the approach we adopted in Section 4.5 for the projection of the 
vector v onto the vector u. Recall that in that section we defined the projection 
as the vector w that is a scaled version of u and that satisfies that (v — w) is 
orthogonal to u. Of course, if (v — w) is orthogonal to u, then it is orthogonal to 
any scaled version of u, i.e., it is orthogonal to every signal in the space span(u). 


We would like to adopt this approach and to define the projection of v € Ly ontoU 
as the element w of U/ for which (v — w) is orthogonal to every signal in U/. Before 
we can adopt this definition, we must show that such an element of U/ always exists 
and that it is unique. 


Lemma 4.6.5. Let (f1,...,@a) be an orthonormal basis for the linear subspace 
UC Lo. Letv € Le be arbitrary. 


(i) The signal v — aie (v, bc) de is orthogonal to every signal in U: 


(v = 3 (v, be) de, u) 6 (v 2g. We u). (4.32) 


l=1 


(it) If w €U is such that v — w is orthogonal to every signal in U, then 


d 
w=) (v,d0) de. (4.33) 


l=1 


1As we shall see in Section 4.6.5, not every finite-dimensional linear subspace of Lg has an 
orthonormal basis. Here we shall only discuss projections onto subspaces that do. 
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Proof. To prove (4.32) we first verify that it holds when u = @y, for some ¢’ in 
the set {1,...,d}: 


(v a (v, be) 61.0 ) = (v, ber) — oy (v, be) 01.00) 


l=1 f=1 


= (v, bv) — S° (v, be) (be, be) 
d 
= (v, be) — So (v, be) HO = 0} 


=0, “€{1,...,d}. (4.34) 


Having verified (4.32) for u = de we next verify that this implies that it holds 
for allu € U. By Proposition 4.6.4 we obtain that any u € YU can be written as 
u= icy Bede, where Be = (u, der). Consequently, 


(v 25) (v, Pe) be, u \= (v-SoWv.d) be. 5 Bede) 


t=1 t=1 val 
d d 
= 8 (v -~ SW, de) #100 ) 
val t=1 
d 
val 
=0, ucl, 


where the third equality follows from (4.34) and the basic properties of the inner 
product (3.6)—(3.10). 


We next prove Part (ii) by showing that if w, w’ € U satisfy 
(v—w,u)=0, ucu (4.35) 


and 
(v—w’,u)=0, ued, (4.36) 


then w = w’. 


This follows from the calculation: 


d d 
= YF w, be) be — Dow", de) 


f=1 l=1 
d 
= So (w—w', be) de 
f=1 
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where the first equality follows from Proposition 4.6.4; the second by the linearity of 
the inner product in its left argument (3.9); the third by adding and subtracting v; 
the fourth by the linearity of the inner product in its left argument (3.9); and the 
fifth equality from (4.35) & (4.36) applied by substituting @¢ for u. 


With the aid of the above lemma we can now define the projection of a signal onto 
a finite-dimensional subspace that has an orthonormal basis.” 


Definition 4.6.6 (Projection of v € Ly onto UU). Let U C Lo be a finite- 
dimensional linear subspace of Le having an orthonormal basis. Let v € Lo be an 
arbitrary energy-limited signal. Then the projection of v onto U is the unique 
element w of U such that 


(v—w,u)=0, ued. (4.37) 


Note 4.6.7. By Lemma 4.6.5 it follows that if (@1,...,@a) is an orthonormal basis 
for U, then the projection of v € Ly onto U is given by 


d 
do (v, be) de. (4.38) 


l=1 


To further develop the geometric picture of £2, we next show that, loosely speaking, 
the projection of v € Lg onto U is the element in U that is closest to v. This result 
can also be viewed as an optimal approximation result: if we wish to approximate v 
by an element of U/, then the optimal approximation is the projection of v onto U, 
provided that we measure the quality of our approximation using the energy in the 
error signal. 


Proposition 4.6.8 (Projection as Best Approximation). Let U/C Lo be a finite- 
dimensional subspace of Lg having an orthonormal basis (d,,...,@a). Let v € Le 
be arbitrary. Then the projection of v onto U is the element w € U that, among 
all the elements of U, is closest to v in the sense that 


lv —ully > [lv—wll,, uew. (4.39) 


Proof. Let w be the projection of v onto YU and let u be an arbitrary signal in U. 
Since, by the definition of projection, w is in Y/ and since U is a linear subspace, 
it follows that w — u € U. Consequently, since by the definition of the projection 


2A projection can also be defined if the subspace does not have an orthonormal basis, but in 
this case there is a uniqueness issue. There may be numerous vectors w € U such that v — w is 
orthogonal to all vectors in U/. Fortunately, they are all indistinguishable. 
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v — w is orthogonal to every element of U, it follows that v — w is a fortiori 
orthogonal to w — u. Thus 


Iv — ull3 = ||(v — w) + (w- u)I|3 
= |lv — wll3 + lw — ull (4.40) 
> |lv — wIl3, (4.41) 


where the first equality follows by subtracting and adding w, the second equality 
from the orthogonality of (v — w) and (w — u), and the final equality by the 
nonnegativity of ||-||,. It follows from (4.41) that no signal in UY is closer to v 
than w is. And it follows from (4.40) that if u € U is as close to v as w is, 
then u — w must be an element of YU that is of zero energy. We shall see in 
Proposition 4.6.10 that the hypothesis that U/ has an orthonormal basis implies 
that the only zero-energy element of U/ is 0. Thus u and w must be identical, and 
no other element of U/ is as close to v as w is. 


4.6.4 Energy, Inner Products, and Orthonormal Bases 


As demonstrated by Proposition 4.6.4, if (@1,...,@a) forms an orthonormal basis 
for the subspace U C Lo, then any signal u € U can be reconstructed from the d 
numbers (u, $1) ,...,(u, 6g). Any quantity that can be computed from u can thus 
be computed from (u, #1) ,...,(u,@q) by first reconstructing u and by then per- 
forming the calculation on u. But some calculations involving u can be performed 
based on (u, $1) ,...,(u,@a) much more easily. 


Proposition 4.6.9. Let (d1,...,@a) be an orthonormal basis for the linear subspace 
Uc Lo. 


i) The ener ull? of every u € U can be expressed in terms of the d inner 
gy 2 7] 


products (u,@1),...,(u, da) as 
d 
lulls =~] (u, be)|’. (4.42) 
f=1 


ti) More generally, if v € Le (not necessarily in U), then 
g Y 


d 


lIvllz = I (v, be)|° (4.43) 


with equality if, and only if, v is indistinguishable from some signal in U. 


(itt) The inner product between any v € Le and anyu € U can be expressed in 
terms of the inner products {(v,@e)} and {(u, dc)} as 


d 


=~ (v, be) (u, be)". (4.44) 


L=1: 
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Proof. Part (i) follows directly from the Pythagorean Theorem (Theorem 4.5.2) 
applied to the d-tuple ((u, 1) o1,.--, (U, ba) da): 
To prove Part (ii) we expand the energy in v as 


Iwil2 = ||(v- ow. be) be) +b, ol, 


d 
l=1 l=1 


d 2 
= |v - Yo 4) deff. + 


es 
Il 
an 


v— Sov, $0) bel, + Iv. 60)| 


é=1 l=1 


2 
’ 


d 
> S\lv, de) (4.45) 
l=1 


where the first equality follows by subtracting and adding the projection of v 
onto U; the second from the Pythagorean Theorem and by Lemma 4.6.5, which 
guarantees that the difference between v and its projection is orthogonal to any 
signal in U and hence a fortiori also to the projection itself; the third by Part (i) 
applied to the projection of v onto U; and the final inequality by the nonnegativity 
of energy. 


If Inequality (4.45) holds with equality, then the last inequality in its derivation 
must hold with equality, so lv = ys (v, de) d1|[, = 0 and hence v must be 
indistinguishable from the signal Denar (v, bc) be, which is in U. 
Conversely, if v is indistinguishable from some u’ € U/, then 
lvls =I — a!) + u'll, 
= |Ilv— wlll, + [lull 


2 
2 


d 
= So |(u’, oe)? 


=u | 


d 
_ S- (v, ge) + (u’ —Y%, oe)? 


d 
= > (v, de)’, 


where the first equality follows by subtracting and adding u’; the second follows 
from the Pythagorean Theorem because the fact that ||v — u’||, = 0 implies that 
(v —u’,u’) = 0 (as can be readily verified using the Cauchy-Schwarz Inequality 
(v—u’,u’)| <||v— ul, |/u’||,); the third from our assumption that v and u’ are 
indistinguishable; the fourth from Part (i) applied to the function u’ (which is in U/); 
the fifth by adding and subtracting v; and where the final equality follows because 
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(u’ — v,@c) = 0 (as can be readily verified from the Cauchy Schwarz Inequality 
|(u’ — v, de)| < lu’ — vIlp IlGell2)- 


To prove Part (iii) we compute (v, u) as 


d d 
(y, w= (v —~ So (v, be) be + d- (v, be) ) nu) 
d oA d 
v—S_ (v, be) be, w) + (So (wos) ae, u) 


e=1 


where the first equality follows by subtracting and adding San (v, be) be; the 
second by the linearity of the inner product in its left argument (3.9); the third 
because, by Lemma 4.6.5, the signal v — aan (v, bc) d¢ is orthogonal to any signal 
in U and a fortiori to u; the fourth by the linearity of the inner product in its left 
argument (3.7) & (3.9); and the final equality by (3.6). 


Proposition 4.6.9 has interesting consequences. It shows that if one thinks of (u, @¢) 
as the ¢-th coordinate of u (with respect to the orthonormal basis (@1,...,@a)), 
then the energy in u is simply the sum of the squares of the coordinates, and the 
inner product between two functions is the sum of the products of each coordinate 
of u and the conjugate of the corresponding coordinate of v. 


We hope that the properties of orthonormal bases that we presented above have 
convinced the reader by now that there are certain advantages to describing func- 
tions using an orthonormal basis. A crucial question arises as to whether orthonor- 
mal bases always exist. This question is addressed next. 


4.6.5 Does an Orthonormal Basis Exist? 


Word on the street has it that every finite-dimensional subspace of Lg has an 
orthonormal basis, but this is not true. (It is true for the space Ly that we shall 
encounter later.) For example, the set 


{u € Ly: u(t)=0 whenever t 4 17} 


of all energy-limited signals that map t to zero whenever t 4 17 (with the value 
to which t = 17 is mapped being unspecified) is a one dimensional subspace of Ly 
that does not have an orthonormal basis. (All the signals in this subspace are of 
zero energy, so there are no unit-energy signals in it.) 
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Proposition 4.6.10. Jf U is a finite-dimensional subspace of Lo, then the following 
two statements are equivalent: 


(a) U has an orthonormal basis. 


(b) The only element of U of zero energy is the all-zero signal 0. 


Proof. The proof has two parts. The first consists of showing that (a) => (b), i-e., 
that if % has an orthonormal basis and if u € U is of zero energy, then u must 
be the all-zero signal 0. The second part consists of showing that (b) => (a), ie., 
that if the only element of zero energy in U is the all-zero signal 0, then U/ has an 
orthonormal basis. 


We begin with the first part, namely, (a) = (b). We thus assume that (@1,..., a) 
is an orthonormal basis for U and that u € U satisfies ||ul|, = 0 and proceed 
to prove that u = 0. We simply note that, by the Cauchy-Schwarz Inequality, 
|(u, be)| < |lully ||e||, so the condition ||ul], = 0 implies 


(ude) =0, €€ {1,...,d}, (4.46) 


and hence, by Proposition 4.6.4, that u = 0. 


To show (b) = (a) we need to show that if no signal in YU other than 0 has zero 
energy, then Y/ has an orthonormal basis. The proof is based on the Gram-Schmidt 
Procedure, which is presented next. As we shall prove, if the input to this procedure 
is a basis for U and if no element of U/ other than O is of energy zero, then the 
procedure produces an orthonormal basis for U. The procedure is actually even 
more powerful. If it is fed a basis for a subspace that does contain an element other 
than O of zero-energy, then the procedure produces such an element and halts. 


It should be emphasized that the Gram-Schmidt Procedure is not only useful for 
proving theorems; it can be quite useful for finding orthonormal bases for practical 
problems.? 


4.6.6 The Gram-Schmidt Procedure 


The Gram-Schmidt Procedure is named after the mathematicians Jorgen Pedersen 
Gram (1850-1916) and Erhard Schmidt (1876-1959). However, as pointed out in 
(Farebrother, 1988), this procedure was apparently already presented by Pierre- 
Simon Laplace (1749-1827) and was used by Augustin Louis Cauchy (1789-1857). 


The input to the Gram-Schmidt Procedure is a basis (uy,..., Ua) for a d-dimensional 
subspace U C Ly. We assume that d > 1. (The only 0-dimensional subspace of £4 
is the subspace {0} containing the all-zero signal only, and for this subspace the 
empty tuple is an orthonormal basis; there is not much else to say here.) If U 
does not contain a signal of zero energy other than the all-zero signal 0, then the 
procedure runs in d steps and produces an orthonormal basis for U/ (and thus also 
proves that U does not contain a zero-energy signal other than 0). Otherwise, the 


3Numerically, however, it is unstable; see (Golub and van Loan, 1996). 
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procedure stops after d or fewer steps and produces an element of U of zero energy 
other than 0. 


The Gram-Schmidt Procedure: 


Step 1: If |/u;||, = 0, then the procedure declares that there exists a 
zero-energy element of U other than 0, it produces uy as proof, and it 


halts. Otherwise, it defines 
I|ur|| 2 


Pi 


and halts with the output (@,) (if d = 1) or proceeds to Step 2 (if 
d> 1). 


Assuming that the procedure has run for vy — 1 steps without halting 
and has defined the vectors @1,...,@,—1, we next describe Step v. 


Step v: Consider the signal 


y-1 


uy = Uy — S- (uy, de) de. (4.47) 


l=1 


If ||u,||, = 0, then the procedure declares that there exists a zero- 
energy element of / other than 0, it produces tu, as proof, and it halts. 
Otherwise, the procedure defines 


gy 


= (4.48) 
Ila, | 2 

and halts with the output (@1,...,@a) (if v is equal to d) or proceeds 

to Step v +1 (if v < d). 


We next prove that the procedure behaves as we claim. 


Proof. To prove that the procedure behaves as we claim, we shall assume that the 
procedure performs Step v (i.e., that it has not halted in the steps preceding v) 
and prove the following: if at Step v the procedure declares that U/ contains a 
nonzero signal of zero-energy and produces u, as proof, then this is indeed the 
case; otherwise, if it defines @, as in (4.48), then (@1,...,@,) is an orthonormal 
basis for span(u;,...,U,). 


We prove this by induction on v. For v = 1 this can be verified as follows. If 
\|ui ||, = 0, then we need to show that u; € UY and that it is not equal to 0. This 
follows from the assumption that the procedure’s input (ui,...,Uq¢) forms a basis 
for U, so a fortiori the signals u,,...,uq must all be elements of UU and neither 
of them can be the all-zero signal. If ||u;||, > 0, then @; is a unit-energy scaled 
version of u; and thus (#1) is an orthonormal basis for span(uy). 


We now assume that our claim is true for y—1 and proceed to prove that it is also 
true for vy. We thus assume that Step v is executed and that (@1,...,@,-1) is an 
orthonormal basis for span(uj,...,U,—1): 


$i,---,Pv-1 EU; (4.49) 
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span(@1,...,@y-1) = span(uj,...,U,—1); (4.50) 
and 
(be, oe) ==}, £2, € f{1,...,v—-1}. (4.51) 
We need to prove that if U, is of zero energy, then it is a nonzero element of U of 
zero energy, and that otherwise the v-tuple (¢1,...,@,) is an orthonormal basis 
for span(uj,...,u,). To that end we first prove that 


a, Eu (4.52) 


and that 

u, # O. (4.53) 
We begin with a proof of (4.52). Since (4.47) expresses U, as a linear combination 
of (¢1,...,@v-1, Uy), and since U is by assumption a linear subspace, it suffices to 
show that @1,...,@,-1 € U and that u, € U. The former follows from (4.49) and 
the latter from our assumption that (u1,...,uq) forms a basis for U. 


We next prove (4.53). By (4.47) it suffices to show that u, ¢ span(@1,...,@v-1). 
By (4.50) this is equivalent to showing that u, ¢ span(uj,...,u,—-1), which fol- 
lows from our assumption that (u,,...,Uq) is a basis for YU and a fortiori linearly 
independent. 


Having established (4.52) and (4.53) it follows that if ||u,||, = 0, then U, is a 
nonzero element of U/ which is of zero-energy as we had claimed. 


To conclude the proof we now assume ||, ||, > 0 and prove that (@1,...,@,) is 
an orthonormal basis for span(u,,...,u,). That (@1,...,@,) is orthonormal fol- 
lows because (4.51) guarantees that (@1,...,@,—1) is orthonormal; because (4.48) 
guarantees that @, is of unit energy; and because Lemma 4.6.5 (applied to the lin- 
ear subspace span(@,...,@,—1)) guarantees that t,—and hence also its scaled 
version @,—is orthogonal to every element of span(@1,...,@,-1) and in par- 
ticular to @1,...,@,-1. It thus only remains to show that span(@,,...,@)) = 
span(u;,...,U,). We first show that span(@1,...,@.) C span(ui,...,u,). This 
follows because (4.50) implies that 


fi,---,@v—1 € span(u,...,Upy—1); (4.54) 
because (4.54), (4.47) and (4.48) imply that 
ody € span(uy,...,U,); (4.55) 
and because (4.54) and (4.55) imply that ¢1,...,@, € span(u,,...,u,,) and hence 
that span(@,,...,@,) C span(uj,...,u,). The reverse inclusion can be argued 
very similarly: by (4.50) 
Uj,..-,Uy-1 € span(@1,..., dv-1); (4.56) 
by (4.47) and (4.48) we can express u, as a linear combination of (@1,...,@,) 
v1 
uy = ||ti ||, dv + S- (uy, Pe) de: (4.57) 
f=1 


and (4.56) & (4.57) combine to prove that ui,...,u, € span(@,...,@,) and hence 
that span(uj,...,u,) C span(@i,..., @,). 
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By far the more important scenario for us is when U does not contain a nonzero 
element of zero energy. This is because we shall mostly focus on signals that are 
bandlimited (see Chapter 6), and the only energy-limited signal that is bandlimited 
to W Hz and that has zero-energy is the all-zero signal (Note 6.4.2). For subspaces 
not containing zero-energy signals other than O the key properties to note about 
the signals @),...,@q produced by the Gram-Schmidt procedure are that they 
satisfy for each v € {1,...,d} 


span(u;,...,U,) = span(@1,..., @,) (4.58a) 


and 

(dr, teh dv) is an orthonormal basis for span(u,,...,U,). (4.58b) 
These properties are, of course, of greatest importance when v = d. 
We next provide an example of the Gram-Schmidt procedure. 


Example 4.6.11. Consider the following three signals: u,: t + I{0 < t < 1}, 
ug: tH tl{0 <t< 1}, and uj: th t?1{0 <t < 1}. The tuple (uj, ue, us) forms 
a basis for the subspace of all signals of the form t + p(t) {0 < t < 1}, where p(-) 
is a polynomial of degree smaller than 3. To construct an orthonormal basis for 
this subspace with the Gram-Schmidt Procedure, we begin by normalizing uy. To 
that end, we compute 


Imig= fost tPar=1 
and set @; = uj/ ||u1||,, so 
di: tReHO<t< 1}. (4.59a) 


The second function @2 is now obtained by normalizing uz — (uz, 61) d1. We first 
compute the inner product (ug, d1) 


lee) 1 
1 
(us, 61) = | Ho<t<teosts<ijat= | tdt = 5 
lee) 0 
to obtain that u, — (ue, 1) d1: te (t — 1/2) 1{0 < t < 1}, which is of energy 


1 1\2 1 
|| us _ (uz, $1) dlls - | (e- =) dt = oR 


Hence, 
1 
bo:t V12 (¢ = =) 0 <t< 1}. (4.59b) 


The third function #3 is the normalized version of us — (u3, 61) @1 — (Us, b2) ba. 
The inner products (u3,@1) and (us, @2) are respectively 


: 1 
(us, d1) = | i di= 5? 
0 3 


tus.) = [PvE -5) dt= 
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Consequently 


us — (3, $1) b1 — (Us, b2) Pa: t (? ; (t 5) )Ho<t<1) 


with corresponding energy 


1 12 j 
a2 — (9,1) &1 ~ (9,2) bal = f (@-t+ 2) dt= ao. 


Hence, the orthonormal basis is completed by the third function 


os V180(¢? =t+ *) H0<t< 1}. (4.59¢) 


4.7 The Space L» 


Very informally one can describe the space Lz as the space of all energy-limited 
complex-valued signals, where we think of two signals as being different only if they 
are distinguishable. This section defines Lg more precisely. It can be skipped be- 
cause we shall have only little to do with DL». Understanding this space is, however, 
important for readers who wish to fully understand how the Fourier Transform is 
defined for energy-limited signals that are not integrable (Section 6.2.3). Readers 
who continue should recall from Section 2.5 that two energy-limited signals u and v 
are said to be indistinguishable if the set {t € R : u(t) 4 v(t)} is of Lebesgue 
measure zero. We write u = v to indicate that u and v are indistinguishable. By 
Proposition 2.5.3, the condition u = v is equivalent to the condition |/u — v||, = 0. 


To motivate the definition of the space D2, we begin by noting that the space Ly 
of energy-limited signals is “almost” an example of what mathematicians call an 
“inner product space,” but it is not. The problem is that mathematicians insist 
that in an inner product space the only vector whose inner product with itself is 
zero be the zero vector. This is not the case in Ly: it is possible that u € Lo 
satisfy (u,u) = 0 (ie., ||ul|, = 0) and yet not be the all-zero signal 0. From the 
condition ||u||, = 0 we can only infer that u is indistinguishable from 0. 


The fact that £2 is not an inner product space is an annoyance because it pre- 
cludes us from borrowing from the vast literature on inner product spaces (and 
Hilbert spaces, which are special kinds of inner product spaces), and because it 
does not allow us to view some of the results about £2 as instances of more gen- 
eral principles. For this reason mathematicians prefer to study the space D2, which 
is an inner product space (and which is, in fact, a Hilbert space) rather than Lg. 
Unfortunately, for this luxury they pay a certain price that I am loath to pay. 
Consequently, in most of this book I have decided to stick to Ly even though this 
precludes me from using the standard results on inner product spaces. The price 
one pays for using Lg, will become apparent once we define it. 


To understand how Lg is constructed it is useful to note that the relation “u = v”, 
i.e., “u is indistinguishable from v” is an equivalence relation on Lo, i.e., it 
satisfies 


u=u, ucLlo; (reflexive) 
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(u = v) S (v = u), u,v € Lo; (symmetric) 


and 
(u =vandv= w) > (u = w), u,v,w € Lo. (transitive) 


Using these properties one can verify that if for every u € Lo we define its equiv- 
alence class [ul] as 
[u] = {we Lo: a=}, (4.60) 


then two equivalence classes [u] and [v] must be either identical or disjoint. In 
fact, the sets [u] C Le and [v] C Lg are identical if, and only if, u and v are 
indistinguishable 


([ul = [v]) + (ilu-vil,=0), uve Le, 
and they are disjoint if, and only if, u and v are distinguishable 


(ul ni] =0) S (Ilu- vil, >0), u,v € Lo. 


We define Lg as the set of all such equivalence classes 
Le = {ful :u€ Lo}. (4.61) 


Thus, the elements of Lz, are not functions, but sets of functions. Each element 
of Ly is an equivalence class, i.e., a set of the form [u] for some u € Lg. And for 
each u € Lg the equivalence class [u] is an element of Lg. 


As we next show, the space Lg can also be viewed as a vector space. To this end 
we need to first define “amplification of an equivalence class by a scalar a € C” and 
“superposition of two equivalence classes.” How do we define the scaling-by-a of 
an equivalence class S € Lg? A natural approach is to find some function u € Lg 
such that S is its equivalence class (i.e., satisfying S = [u]), and to define the 
scaling-by-a of S as the equivalence class of au, i.e., as [wu]. Thus we would define 
aS as the equivalence class of the signal t + au(t). While this turns out to be 
a good approach, the careful reader might be concerned by something. Suppose 
that S = [u] but that also S = [a]. Should aS be defined as the equivalence class 
of t+ au(t) or of t + at(t)? Fortunately, it does not matter because the two 
equivalence classes are the same! Indeed, if [u] = [uJ], then the equivalence class of 
t+ au(t) is equal to the equivalence class of t +> aii(t) (because [u] = [tu] implies 
that u and U agree except on a set of measure zero so au and au also agree except 
on a set of measure zero, which in turn implies that [au] = [ad)). 


Similarly, one can show that if S; € Lg and Sz € Lg are two equivalence classes, 
then we can define their sum (or superposition) S; + Sj as [u; + ug] where uy 
is any function in Ly such that S; = [ui] and where ug is any function in Lg 
such that S: = [uy]. Again, to make sure that the result of the superposition of 
S; and S_ does not depend on the choice of u, and ug we need to verify that if 
S; = [uy] = [G)] and if Sy = [ug] = [ty] then [u, + ug] = [) + Uy]. This is not 
difficult but is omitted. 
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Using these definitions and by defining the zero vector to be the equivalence 
class [0], it is not difficult to show that Lz, forms a linear space over the com- 
plex field. To make it into an inner product space we need to define the inner 
product (S,,S2) between two equivalence classes. If S; = [uy] and if Sy = [ug] 
we define the inner product (S1,S2) as the complex number (uj, uz). Again, we 
have to show that our definition is good in the sense that it does not depend on 
the particular choice of u; and ug. More specifically, we need to verify that if 
S} = fu] = [a1] and if So = [up] = [a2] then (U1, U2) = (t11, U2). This can be 
proved as follows: 
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where the third equality follows because [u;] = [1] implies that ||u; — U1||, = 0 
and hence that (uj — ¥,,u2) = 0 (Cauchy-Schwarz Inequality), and where the 
last equality follows by a similar reasoning about ug and Ug. Using the above 
definition of the inner product between equivalence classes one can show that if for 
some equivalence class S we have (S,S) = 0, then S is the zero vector, i.e., the 
equivalence class [0]. 


With these definitions of the scaling of an equivalence class by a scalar, the super- 
position of two equivalence classes, and the inner product between two equivalence 
classes, the space of equivalence classes Ly becomes an inner product space in the 
sense that mathematicians like. In fact, it is a Hilbert space. 


What is the price we have to pay for working in an inner product space? It 
is that the elements of Lg are not functions but equivalence classes and that it 
is meaningless to talk about the value they take at a given time. For example, 
it is meaningless to discuss the supremum (or maximum) of an element of L».* 
To add to the confusion, mathematicians refer to elements of Lg as “functions” 
(even though they are equivalence classes of functions), and they drop the square 
brackets. Things get even trickier when one deals with signals contaminated by 
noise. If one views the signals as elements of D2, then the result of adding noise to 
them is not a stochastic process (Definition 12.2.1 ahead). We find this price too 
high, and in this book we shall mostly deal with Lo. 


4.8 Additional Reading 


Most of the results of this chapter follows from basic results on inner product 
spaces and can be found, for example, in (Axler, 1997). However, since Le is not 
an inner-product space, we had to introduce some slight modifications. 


4To deal with this, mathematicians define the essential supremum. 
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More on the definition of the space Ly, can be found in most texts on analysis. See, 
for example, (Rudin, 1974, Chapter 3, Remark 3.10) and (Royden, 1988, Chapter 1 
Section 7). 


4.9 Exercises 


Exercise 4.1 (Linear Subspace). Consider the set of signals u of the form u: t > e' p(t), 
where p(-) is a polynomial whose degree does not exceed d. Is this a linear subspace of £2? 
If yes, find a basis for this subspace. 


Exercise 4.2 (Characterizing Infinite-Dimensional Subspaces). Recall that we say that a 
linear subspace is infinite dimensional if it is not of finite dimension. Show that a linear 
subspace U is infinite dimensional if, and only if, there exists a sequence uj, U2,... of 
elements of U such that for every n € N the tuple (ui,..., Un) is linearly independent. 


Exercise 4.3 (Le Is Infinite Dimensional). Show that Ly is infinite dimensional. 


Hint: Exercises 4.1 and 4.2 may be useful. 


Exercise 4.4 (Separation between Signals). Given uj,u2 € La, let V be the set of all 
complex signals v that are equidistant to u; and uz: 


V={v ele: |\v— ull 


a} 


= I|v u2 


(i) Show that 


2 
Y= ‘ E Le : Re((v, us 1) am || us ; it. 


(ii) Is V a linear subspace of Ly? 


(iii) Show that (ui + u2)/2 € V. 


Exercise 4.5 (Projecting a Signal). Let u € Le be of positive energy, and let v € Lo be 
arbitrary. 


(i) Show that Definitions 4.6.6 and 4.5.3 agree in the sense that the projection of v 
onto span(u) (according to Definition 4.6.6) is the same as the projection of v onto 
the signal u (according to Definition 4.5.3). 


(ii) Show that if the signal u is an element of a finite-dimensional subspace U/ having 
an orthonormal basis, then the projection of u onto U is given by u. 


Exercise 4.6 (Orthogonal Subspace). Given signals vi,...,Vn € £2, define the set 


U = {uc Le: (u,vi) = (u, v2) =--- = (u, vn) = OF. 


Show that U is a linear subspace of Le. 


Exercise 4.7 (Constructing an Orthonormal Basis). Let T; be a positive constant. Con- 
sider the signals s;: t » I{0 < t < T;/2} —I{Ts/2 < t < Ts}; so: t Re 10 <t < Ts}; 
sgi:tth {0 <t < T,/4}+1{3T,/4 <t < Ts}; and sa: tre HO <t < T,/4} — 1{8T,/4 < 
t < Ts}. 
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(i) Plot si, s2, $3, and sa. 
(ii) Find an orthonormal basis for span (s1, S2, $3, $4). 


(iii) Express each of the signals s,, s2, $3, and s4 as a linear combination of the basis 
vectors found in Part (ii). 


Exercise 4.8 (Is the £2-Limit Unique?). Show that for signals ¢,x1,x2,... in Le the 
statement 
=0 


lim ||xn — ¢|l, 
n—-co 


is equivalent to the statement 


( tim fxn — ¢l|, = 0)  (C€ (dl). 


Exercise 4.9 (Signals of Zero Energy). Given vi,...,vn € Le, show that there exist 
integers 1 < 4 < 2 < +++ < vg < n such that the following three conditions hold: 


the d-tuple (vu, mae eV) is linearly independent; span(v,,,...,Vv,) contains no signal 
of zero energy other than the all-zero signal 0; and each element of span(vi,...,Vn) is 
indistinguishable from some element of span(v,,,.-.,Vv,4)- 


Exercise 4.10 (Orthogonal Subspace). Given vi,...,vn € Le, define the set 
U = {ue Le: (u,v1) = (u, v2) =--- = (u, vn) = OF, 


and the set of all energy-limited signals that are orthogonal to all the signals in U/: 
ut = {w E Le: ((w,u) =0, ueu)}. 


(i) Show that U/+ is a linear subspace of Le. 
(ii) Show that an energy-limited signal is in U/+ if, and only if, it is indistinguishable 


from some element of span(vi,...,Vn)- 


Hint: For Part (ii) you may find Exercise 4.9 useful. 


Exercise 4.11 (More on Indistinguishability). Given v1,...,vn € £2 and some w € Lo, 
propose an algorithm to check whether there exists an element of span(vi,...,Vn) that 
is indistinguishable from w. 


Hint: Exercise 4.9 may be useful. 


Chapter 5 


Convolutions and Filters 


5.1 Introduction 


Convolutions play a central role in the analysis of linear systems, and it is thus 
not surprising that they will appear repeatedly in this book. Most of the readers 
have probably seen the definition and key properties in an earlier course on linear 
systems, so this chapter can be viewed as a very short review. New perhaps is 
the following section on notation and the all-important Section 5.8 on the matched 
filter and its use in calculating inner products. 


5.2 Time Shifts and Reflections 


Suppose that x: R — R is a real signal, where we think of the argument as being 
time. Such functions are typically plotted on paper with the time arrow pointing 
to the right. Take a moment to plot an example of such a function, and on the 
same coordinates plot the function 


tro x(t = to), 


which maps every ¢ € R to 2(t — to) for some positive to. Repeat with to being 
negative. This may seem like a mindless exercise but there is a point to it. It 
will help you understand convolutions graphically and help you visualize mappings 
such as t+ >, ae9(t — €T;), which we will encounter later in our study of Pulse 
Amplitude Modulation (PAM). It will also help you visualize the matched filter. 


Given a complex signal x: R — C, we denote its reflection or mirror image 
by x: 


x: th x£(-t). (5.1) 


Its plot is the mirror image of the plot of x(-) about the vertical axis. The mirror 
image of the mirror image of x is x. 
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5.3. The Convolution Expression 


The convolution x x h between two complex signals x: R — C and h: R — C is 
formally defined as the complex signal whose time-t value (x * h)(t) is given by 


(xxh)(t) = / x(rT) h(t — 7) dr. (5.2) 
Note that the integrand in the above is complex. (See Section 2.3 for a discussion 
of such integrals.) This definition also holds for real signals. 


We used the term “formally defined” because certain conditions need to be met 
for this integral to be defined. It is conceivable that for some t € R the integrand 
7+ 2(T) h(t — 7) will not be integrable, so the integral will ni undefined. (Recall 
that in this book we an allow integrals of the form i hase t) dt if the ae 
g(-) isin L, so f°. |g(t)|dt < oo. Otherwise, we say that a ee J, g(t) at 
is undefined.) We a say that x *h is defined at t € Rif TH a(r ya(t - a is 
integrable. 


While (5.2) does not make it apparent, the convolution is in fact symmetric in x 
and h. Thus, the integral in (5.2) is defined for a given ¢ if, and only if, the integral 


a h(o) 2(t — 0) do (5.3) 


is defined. And if both are defined, then their values are identical. This follows 
directly by the change of variable o & t — r. 


5.4 Thinking About the Convolution 


Depending on the application, we can think about the convolution operation in a 
number of different ways. 


(i) Especially when h(-) is nonnegative and integrates to one, one can think of 
the convolution as an averaging, or smoothing, operation. Thus, when x is 
convolved with h the result at time to is not x(t) but rather a smoothed 
version thereof, namely, i x(to —T) h(r) dr. For example, if h is the map- 
ping t I{|t| < T/2}/T for some T > 0, then the convolution x xh at time 
to is not x(to) but rather 


1 to+T/2 
= u(r) dr. 
T ee 


Thus, in this example, we can think of x xh as being a “moving average,” or 
a “sliding-window average” of x. 


(ii) For energy-limited signals it is sometimes beneficial to think about (x*xh)(to) 
as the inner product between the functions T+ 2(7) and T+ h* (tp — T): 


(x *h)(to) = (7 > 2(r),7 + h*(to —7)). (5.4) 
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(iii) Another useful informal way is to think about x xh as a limit of expressions 
of the form 
S> h(t;) a(t — t)), (5.5) 
j 


ie., as a limit of linear combinations of the time shifts of x where the coeffi- 
cients are determined by h. 


5.5 When Is the Convolution Defined? 


There are a number of useful theorems providing sufficient conditions for the con- 
volution’s existence. These theorems can be classified into two kinds: those that 
guarantee that the convolution x «h is defined at every epoch t € R and those 
that only guarantee that the convolution is defined for all epochs t outside a set of 
Lebesgue measure zero. Both types are useful. We begin with the former. 


Convolution defined for every t € R: 


(i) A particularly simple case where the convolution is defined at every time 
instant t is when both x and h are energy-limited: 


x,h E Lo. (5.6a) 


In this case we can use (5.4) and the Cauchy-Schwarz Inequality (Theo- 
rem 3.3.1) to conclude that the integral in (5.2) is defined for every t € R 
and that x xh is a bounded function with 


|(x*h)| < [Ixll, lhll,, teR. (5.6b) 
Indeed, 
|(x * h)(t)| = |(r a(t), TR A(t T))| 
S |r a(r)Ile IT h*E— Tle 
= [xllo hile - 
In fact, it can be shown that the result of convolving two energy-limited 


signals is not only bounded but also uniformly continuous.! (See, for example, 
(Adams and Fournier, 2003, Paragraph 2.23).) 


Note that even if both x and h are of finite energy, the convolution x x h 
need not be. However, if x, h are both of finite energy and if one of them 
is additionally also integrable, then the convolution x x h is a finite energy 
signal. Indeed, 


lIx*hlly <[Ihll, Ixllp, hE L:NLe, xe Le. (5.7) 


For a proof see, for example, (Rudin, 1974, Chapter 7, Exercise 4) or (Stein 
and Weiss, 1990, Chapter 1, Section 1, Theorem 1.3). 


1A function s: R > C is said to be uniformly continuous if for every « > 0 there corresponds 
some positive 5(€) such that |s(€’) — s(€’’)| is smaller than ¢ whenever €’,€’’ € R are such that 


Ig’ — €”| < d(€). 
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(ii) 


(iii) 


Another simple case where the convolution is defined at every epoch t € R is 
when one of the functions is measurable and bounded and when the other is 
integrable. For example, if 

he Ly (5.8a) 


and if x is a Lebesgue measurable function that is bounded in the sense that 
|x(t)| <<o0, teER (5.8b) 


for some constant 0o,, then for every t € R the integrand in (5.3) is integrable 
because |h(o)x(t — o)| < |h(o)| ox, with the latter being integrable by our 
assumption that h is integrable. The result of the convolution is a bounded 
function because 


\(x x h)(t)| = fe h(r) a(t — 7) dr 


< fe |h(r) a(t - T)| dr 
< O00 ||hll|,, tER, (5.8c) 


where the first inequality follows from Proposition 2.4.1, and where the second 
inequality follows from (5.8b). 


For this case too one can show that the result of the convolution is not only 
bounded but also uniformly continuous. 


Using Holder’s Inequality, we can generalize the above two cases to show 
that whenever x and h satisfy the assumptions of Holder’s Inequality, their 
convolution is defined at every epoch t € R and is, in fact, a bounded uni- 
formly continuous function. See, for example, (Adams and Fournier, 2003, 
Paragraph 2.23). 


Another important case where the convolution is defined at every time instant 
will be discussed in Proposition 6.2.5. There it is shown that the convolution 
between an integrable function (of time) with the Inverse Fourier Transform 
of an integrable function (of frequency) is defined at every time instant and 
has a simple representation. This scenario is not as contrived as the reader 
might suspect. It arises quite naturally, for example, when discussing the 
lowpass filtering of an integrable signal (Section 6.4.2). The impulse response 
of an ideal lowpass filter (LPF) is not integrable, but it can be represented 
as the Inverse Fourier Transform of an integrable function; see (6.35). 


Regarding theorems that guarantee that the convolution be defined for every t 
outside a set of Lebesgue measure zero, we mention two. 


Convolution defined for t outside a set of Lebesgue measure zero: 


(i) 


If both x and h are integrable, then one can show (see, for example, (Rudin, 
1974, Theorem 7.14), (Katznelson, 1976, Section VI.1), or (Stein and Weiss, 


5.6 Basic Properties of the Convolution 57 


1990, Chapter 1, Section 1, Theorem 1.3)) that, for all ¢ outside a set of 
Lebesgue measure zero, the mapping 7 +> x(r)h(t — T) is integrable, so for 
all such t¢ the function (x x h)(t) is defined. Moreover, irrespective of how we 
define (x x h)(¢) for t inside the set of Lebesgue measure zero 


Ix* hl], < Ilxll, lhll,, xe Lr. (5.9) 


What is nice about this case is that the result of the convolution stays in 
the same class of integrable functions. This makes it meaningful to discuss 
associativity and other important properties of the convolution. 


(ii) Another case where the convolution is defined for all t outside a set of 
Lebesgue measure zero is when h is integrable and when x is a measur- 
able function for which rT + |a(7)|? is integrable for some 1 < p < co. In 
this case we have (see, for example, (Rudin, 1974, Exercise 7.4) or (Stein and 
Weiss, 1990, Chapter 1, Section 1, Theorem 1.3)) that for all t outside a set 
of Lebesgue measure zero the mapping T +> x(rT)h(t — 7) is integrable so for 
such ¢ the convolution (x * h)(t) is well-defined. Moreover, irrespective of 
how we define (x « h)(t) for ¢ inside the set of Lebesgue measure zero 


(fle «ny(o/?ar) iw < ||bIl, (z je(oyr ar) a (5.10) 


This is written more compactly as 
|x* hl, < hil, Ixll,, p21, (5.11) 


where we use the notation that for any measurable function g and p > 0 


isl, * (f trae) (5.12) 


—co 


5.6 Basic Properties of the Convolution 


The main properties of the convolution are summarized in the following theorem. 


Theorem 5.6.1 (Properties of the Convolution). The convolution is 


xxh=hxx, (commutative) 
(x * g) *h=xx (g * h), (associative) 
x*(g+h) =xx*g+xxh, (distributive) 


and linear in each of its arguments 


x x (ag + Gh) a(x * g) + 3(x*h) 
(ag + Gh) «x = a(g*x) + B(hxx), 
where the above hold for all g,h,x € £1, anda, EC. 


Some of these properties hold under more general or different sets of assumptions 
so the reader should focus here on the properties rather than on the restrictions. 
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5.7 Filters 


A filter of impulse response h is a physical device that when fed the input 
waveform x produces the output waveform h xx. The impulse response h is 
assumed to be a real or complex signal, and it is tacitly assumed that we only feed 
the device with inputs x for which the convolution x x h is defined.” 


Definition 5.7.1 (Stable Filter). A filter is said to be stable if its impulse response 
is integrable. 


Stable filters are also called bounded-input /bounded-output stable or BIBO 
stable, because, as the next proposition shows, if such filters are fed a bounded 
signal, then their output is also a bounded signal. 


Proposition 5.7.2 (BIBO Stability). [f h is integrable and if x is a bounded 
Lebesgue measurable signal, then the signal x xh is also bounded. 


Proof. If the impulse response h is integrable, and if the input x is bounded by 
some constant o.., then (5.8a) and (5.8b) are both satisfied, and the boundedness 
of the output then follows from (5.8c). 


Definition 5.7.3 (Causal Filter). A filter of impulse response h is said to be causal 
or nonanticipative if h is zero at negative times, t.e., if 


h(t)=0, t<0. (5.13) 


Causal filters play an important role in engineering because (5.13) guarantees that 
the present filter output be computable from the past filter inputs. Indeed, the 
time-t filter output can be expressed in the form 


[oe) 


(x x h)(t) = / u(t) h(t — 7) dr 


—Co 


t 
= if x(t) h(t—7)dr, h causal, 


—Co 


where the calculation of the latter integral only requires knowledge of x(r) for 
7 <t. Here the first equality follows from the definition of the convolution (5.2), 
and the second equality follows from (5.13). 


5.8 The Matched Filter 


In Digital Communications inner products are often computed using a matched 
filter. In its definition we shall use the notation (5.1). 


?This definition of a filter is reminiscent of the concept of a “linear time invariant system.” 
Note, however, that since we do not deal with Dirac’s Delta in this book, our definition is more 
restrictive. For example, a device that produces at its output a waveform that is identical to its 
input is excluded from our discussion here because we do not allow h to be Dirac’s Delta. 
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Definition 5.8.1 (The Matched Filter). The matched filter for the signal ¢ is 
a filter whose impulse response is é*, i.e., the mapping 


tr o*(-t). (5.14) 


The main use of the matched filter is for computing inner products: 


Theorem 5.8.2 (Computing Inner Products with a Matched Filter). The inner 
product (u, @) between the energy-limited signals u and @ is given by the output at 
time t = 0 of a matched filter for @ that is fed u: 


e 


(u,@) = (ux *)(0), u,de Lo. (5.15) 


More generally, if g: t+ d(t—to), then (u,g) is the time-to output corresponding 
to feeding the waveform u to the matched filter for @: 


ie u(t) @*(t — to) dt = (ux $*) (to). (5.16) 


—oco 


Proof. We shall prove the second part of the theorem, i.e., (5.16); the first follows 
from the second by setting tg = 0. We express the time-to output of the matched 
filter as: 


(ux d°)(t0) = | ” ult) (to — Tar 


= fu) o'r to) ar, 


—oco 


where the first equality follows from the definition of convolution (5.2) and the 
second from the definition of @* as the conjugated mirror image of @. 


From the above theorem we see that if we wish to compute, say, the three inner 
products (u,g1), (u, go), and (u,g3) in the very special case where the functions 
£1, 82,83 are all time shifts of the same waveform @, i.e., when g,: t+ o(t — ty), 
go: tr o(t— te), and gs: t+ d(t — ts), then we need only one filter, namely, the 
matched filter for @. Indeed, we can feed u to the matched filter for @ and the 
inner products (u, gi), (u,g2), and (u, g3) simply correspond to the filter’s outputs 
at times t,, tg, and ts. One circuit computes all three inner products. This is so 
exciting that it is worth repeating: 


Corollary 5.8.3 (Computing Many Inner Products using One Filter). Jf the 
energy-limited signals {gi} are all time shifts of the same signal @ in the sense 
that 


gj: tr ot—t;), j=1,...,], 


and if u is any energy-limited signal, then all J inner products 


(u,g;), j=l,...,J 
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can be computed using one filter by feeding u to a matched filter for @ and sampling 
the output at the appropriate times t1,..., ty: 


(u,g;) = (ux @*)(t;), j=,..., J. (5.17) 


5.9 The Ideal Unit-Gain Lowpass Filter 


The impulse response of the ideal unit-gain lowpass filter of cutoff frequency W. 
is denoted by LPF, (-) and is given for every W. > 0 by? 


2W sin(27W.t) if t # 0 
LPF, (t) = ae > ¢téR. 5.18 
w.(#) fh if t=0, ote) 
This can be alternatively written as 
LPF yw, (t) = 2W. sinc(2W.t), te R, (5.19) 
where the function sinc(-) is defined by4 
sin(7€) if é Z 0 
14 A TE d ) 
sinc(€) = ER. 5.20 
G t ogee (5.20) 


Notice that the definition of sinc(0) as being 1 makes sense because, for very small 
(but nonzero) values of € the value of sin(€)/&€ is approximately 1. In fact, with 
this definition at zero the function is not only continuous at zero but also infinitely 
differentiable there. Indeed, the function from C to C 


sin(rz) if z £0, 
zk 
1 otherwise, 


is an entire function, i.e., an analytic function throughout the complex plane. 


The importance of the ideal unit-gain lowpass filter will become clearer when we 
discuss the filter’s frequency response in Section 6.3. It is thus named because 
the Fourier Transform of LPFw.(-) is equal to 1 (hence “unit gain”), whenever 
| f| < We, and is equal to zero, whenever |f| > W.. See (6.38) ahead. 


From a mathematical point of view, working with the ideal unit-gain lowpass filter 
is tricky because the impulse response (5.18) is not an integrable function. (It 
decays like 1/t, which does not have a finite integral from t = 1 to t = cw.) This 
filter is thus not a stable filter. We shall revisit this issue in Section 6.4. Note, 
however, that the impulse response (5.18) is of finite energy. (The square of the 
impulse response decays like 1/t? which does have a finite integral from one to 
infinity.) Consequently, the result of feeding an energy-limited signal to the ideal 
unit-gain lowpass filter is always well-defined. 


Note also that the ideal unit-gain lowpass filter is not causal. 


3For convenience we define the impulse response of the ideal unit-gain lowpass filter of cutoff 
frequency zero as the all zero signal. This is in agreement with (5.19). 
4Some texts omit the 7’s in (5.20) and define the sinc(-) function as sin(€)/€ for € 4 0. 
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5.10 The Ideal Unit-Gain Bandpass Filter 


The ideal unit-gain bandpass filter (BPF) of bandwidth W around the carrier 
frequency f., where fe > W/2 > 0 is a filter of impulse response BPF w, ¢.(-), 
where 


BPFw, .(t) = 2Wcos(27f-t) sinc(Wt), teER. (a22 1) 


This filter too is nonstable and noncausal. It derives its name from its frequency 
response (discussed in Section 6.3 ahead), which is equal to one at frequencies f 
satisfying ||f| — fc| < W/2 and which is equal to zero at all other frequencies. 


5.11 Young’s Inequality 


Many of the inequalities regarding convolutions are special cases of a result known 
as Young’s Inequality. Recalling (5.12), we can state Young’s Inequality as follows. 


Theorem 5.11.1 (Young's Inequality). Let x and h be measurable functions such 
that ||x||,, , ||hl|, < co for some 1 < p,q < 00 satisfying 1/p+1/q > 1. Define r 
through 1/p+1/q=1+1/r. Then the convolution integral (5.2) is defined for allt 
outside a set of Lebesgue measure zero; it is a measurable function; and 


Ix * hl], < K [xl], [bi], (5.22) 


where K < 1 is some constant that depends only on p and q. 


Proof. See (Adams and Fournier, 2003, Corollary 2.25). Alternatively, see (Stein 
and Weiss, 1990, Chapter 5, Section 1) where it is derived from the M. Riesz 
Convexity Theorem. 


5.12 Additional Reading 


For some of the properties of the convolution and its use in the analysis of linear 
systems see (Oppenheim and Willsky, 1997) and (Kwakernaak and Sivan, 1991). 


5.13. Exercises 


Exercise 5.1 (Convolution of Delayed Signals). Let x and h be energy-limited signals. 
Let xq: tt x(t — ta) be the result of delaying x by some ta € R. Show that 


(xa *h)(t) = (x*h)(t—ta), teER. 


Exercise 5.2 (The Convolution of Reflections). Let the signals x,y be such that their 
convolution (x * y)(t) is defined at every t € R. Show that the convolution of their 
reflections is also defined at every t € R and that it is equal to the reflection of their 
convolution: 


(x * ¥)(t) = (xxy)(-t), teR. 
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Exercise 5.3 (Convolving Brickwall Functions). For a given a > 0, compute the convolu- 
tion of the signal t +> I{|t| < a} with itself. 


Exercise 5.4 (The Convolution and Inner Products). Let y and @ be energy-limited 
complex signals, and let h be an integrable complex signal. Argue that 


(y,heg) - (yxh",o). 


Exercise 5.5 (The Convolution’s Derivative). Let the signal g: R — C be differentiable, 
and let g’ denote its derivative. Let h: R — C be another signal. Assume that g, g’, 
and h are all bounded, continuous, and integrable. Show that gx h is differentiable and 
that its derivative (g * h)’ is given by g’ xh. 


See (Korner, 1988, Chapter 53, Theorem 53.1). 


Exercise 5.6 (Continuity of the Convolution). Show that if the signals x and y are both 
in £y then their convolution is a continuous function. 


Hint: Use the Cauchy-Schwarz Inequality and the fact that if x € Le and if we define 
x5: tre a(t—d), then lim |x — x6||, = 0. 


Exercise 5.7 (More on the Continuity of the Convolution). Let x and y be in Le. Let the 
sequence of energy-limited signals x1, x2,... converge to x in the sense that ||x — xn 
tends to zero as n tends to infinity. Show that at every epoch t € R, 


2 


lim (xn * y)(t) = (x xy) (0). 


noo 


Hint: Use the Cauchy-Schwarz Inequality 


Exercise 5.8 (Convolving Bi-Infinite Sequences). The convolution of the bi-infinite se- 


quence ...,@—1,@0,a1... with the bi-infinite sequence ...,b-1,bo, 61... is the bi-infinite 
sequence ...,C_1,Co,C1... formally defined by 
Cc SO Op bipys: TEL: (5.23) 


Show that if 


co co 


bs lav|, S- |bL| < co, 


then the sum on the RHS of (5.23) converges for every integer m, and 


co co co 


Ss lem! < ( 3 ial) ( ss bl). 


m=—oo v=—oco v>=—co 


Hint: Recall Problems 3.10 & 8.9 and the Triangle Inequality for Complex Numbers. 


Exercise 5.9 (Stability of the Matched Filter). Let g be an energy-limited signal. Under 
what conditions is the matched filter for g stable? 
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Exercise 5.10 (Causality of the Matched Filter). Let g be an energy-limited signal. 


(i) Under what conditions is the matched filter for g causal? 


(ii) Under what conditions can you find a causal filter of impulse response h and a 
sampling time to such that 


(r*h)(to) =(r,g), re Le? 


(iii) Show that for every 6 > 0 we can find a stable causal filter of impulse response h 
and a sampling epoch to such that for every r € Le 


|(r *h)(to) — (r,g)| <4 [lr] 


Q- 


Exercise 5.11 (The Output of the Matched Filter). Compute and plot the output of the 
matched filter for the signal t > e~' I{t > 0} when it is fed the input t + I{|t| < 1/2}. 


Chapter 6 


The Frequency Response of Filters and 
Bandlimited Signals 


6.1 Introduction 


We begin this chapter with a review of the Fourier Transform and its key properties. 
We then use these properties to define the frequency response of filters, to discuss 
the ideal unit-gain lowpass filter, and to define bandlimited signals. 


6.2 Review of the Fourier Transform 


6.2.1 On Hats, 27’s, w’s, and f’s 


We denote the Fourier Transform (FT) of a (possibly complex) signal x(-) by 
&(-). Some other books denote it by X(-), but we prefer our notation because, 
where possible, we use lowercase letters for deterministic quantities and reserve 
uppercase letters for random quantities. In places where convention forces us to 
use uppercase letters for deterministic quantities, we try to use a special font, e.g., 
P for power, W for bandwidth, or A for a deterministic matrix. 


More importantly, our definition of the Fourier Transform may be different from 
the one you are used to. 


Definition 6.2.1 (Fourier Transform). The Fourier Transform (or the L,- 
Fourier Transform) of an integrable signal x: R > C is the mapping : RC 
defined by 


x: f | a(t) e2™Fe de, (6.1) 
(The FT can also be defined in more general settings. For example, in Section 6.2.3 
it will be defined via a limiting argument for finite-energy signals that are not 
integrable.) 
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This definition should be contrasted with the definition 
X(iw) = / a(t) e* dt, (6.2) 


which you may have seen before. Note the 27, which appears in the exponent in 
our definition (6.1) and not in (6.2). We apologize to readers who are used to (6.2) 
for forcing a new definition, but we have some good reasons: 


(i) With our definition, the transform and its inverse are very similar; see (6.1) 
and (6.4) below. If one uses the definition of (6.2), then the expression for 
the Inverse Fourier Transform requires scaling the integral by 1/(27). 


(ii) With our definition, the Fourier Transform and the Inverse Fourier Trans- 
form of a symmetric function are the same; see (6.6). This simplifies the 
memorization of some Fourier pairs. 


(iii) As we shall state more precisely in Section 6.2.2 and Section 6.2.3, with our 
definition the Fourier Transform possesses an extremely important property: 
it preserves inner products 


(u,v) = (a,v) (certain restrictions apply). 
Again, no 27’s. 


(iv) If x(-) models a function of time, then %(-) becomes a function of frequency. 
Thus, it is natural to use the generic argument ¢ for such signals x(-) and the 
generic argument f for their transforms. It is more common these days to 
describe tones in terms of their frequencies (i.e., in Hz) and not in terms of 
their radial frequency (in radians per second). 


(v) It seems that all books on communications use our definition, perhaps because 
people are used to setting their radios in Hz, kHz, or MHz. 


Plotting the FT of a signal is tricky, because it is a complex-valued function. This 
is generally true even for real signals. However, for any integrable real signal 
x: R —R the Fourier Transform #(-) is conjugate-symmetric, i.e., 


(a- fy=@(f), fe R), x € £; is real-valued. (6.3) 


Equivalently, the magnitude of the FT of an integrable real signal is symmetric, and 
the argument is anti-symmetric.' (The reverse statement is “essentially” correct. 
If x is conjugate-symmetric then the set of epochs ¢ for which x(t) is not real is 
of Lebesgue measure zero.) Consequently, when plotting the FT of a “generic” 
real signal we shall plot a symmetric function, but with solid lines for the positive 
frequencies and dashed lines for the negative frequencies. This is to remind the 
reader that the FT of a real signal is not symmetric but conjugate symmetric. See, 
for example, Figures 7.1 and 7.2 for plots of the Fourier Transforms of real signals. 


lThe argument of a nonzero complex number z is defined as the element 6 of [—7,7) such 
that z= |z|e!?. 


66 The Frequency Response of Filters and Bandlimited Signals 


When plotting the FT of a complex-valued signal, we shall use a generic plot that 
is “highly asymmetric,” using solid lines. See, for example, Figure 7.4 for the FT 
of a complex signal. 


Definition 6.2.2 (Inverse Fourier Transform). The Inverse Fourier Transform 
(IFT) of an integrable function g: R — C is denoted by & and is defined by 


git f g(ther as. (6.4) 


We emphasize that the word “inverse” here is just part of the name of the transform. 
Applying the IFT to the FT of a signal does not always recover the signal.? (Condi- 
tions under which the IFT does recover the signal are explored in Theorem 6.2.13.) 
However, if one does not insist on using the IFT, then every integrable signal can 
be reconstructed to within indistinguishability from its FT; see Theorem 6.2.12. 


Proposition 6.2.3 (Some Properties of the Inverse Fourier Transform). 
(i) If g is integrable, then its IFT is the FT of its mirror image 
e=-& gel. (6.5) 


(ti) If g is integrable and also symmetric in the sense that & = g, then the IFT 
of g ts equal to its FT 


&=8, (geLli and §=g). (6.6) 
(iti) If g is integrable and g is also integrable, then 


g= 6. (6.7) 


Proof. Part (i) follows by a simple change of integration variable: 


6) = [alae da=— f  o(-p)e-P*¥* as 


7 / * (8) e898 ap 
= 9(é), €ER, 


where we have changed the integration variable to 6 & —a. 


?This can be seen by considering the signal t +> I{t = 17}, which is zero everywhere except 
at 17 where it takes on the value 1. Its FT is zero at all frequencies, but if one applies the IFT to 
the all-zero function one obtains the all-zero function, which is not the function we started with. 
Things could be much worse. The FT of some integrable signals (such as the signal t +> I{|t| < 1}) 
is not integrable, so the IFT of their FT is not even defined. 
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Part (ii) is a special case of Part (i). To prove Part (iii) we compute 


ey [ (f atnrer ap) ee at 


g(—t) ee i2nkt dt 


where we have changed the integration variable to tr = —t. 


Identity (6.6) will be useful in Section 6.2.5 when we memorize the FT of the 
Brickwall function + GI{|€| < y}, which is symmetric. Once we succeed we will 
also know its IFT. 


Table 6.1 summarizes some of the properties of the FT. Note that some of these 
properties require additional technical assumptions. 
Property Function Fourier Transform 
linearity ax + By ax + By 
time shifting tr a(t — to) fre ?rfto a f) 
frequency shifting tr el2™ fot (t) fraf—fo) 
conjugation tr a*(t) fra#(-f) 
stretching (a € R, a #0) tr x(at) fr iat a(S) 
convolution in time xxy fre a(fg(f) 
multiplication in time tr a(t) y(t) xxKy 
real part tr Re(x(t)) | fro 58(f) + 52*(-S) 
time reflection x x 
transforming twice x x 
FT of IFT x x 


Table 6.1: Basic properties of the Fourier Transform. Some restrictions apply! 


6.2.2 Parseval-like Theorems 


A key result on the Fourier Transform is that, subject to some restrictions, it pre- 
serves inner products. Thus, if x, and X2 are the Fourier Transforms of x; and xo, 
then the inner product (x1,xX2) between x; and x2 is typically equal to the inner 
product (X1,X2) between their transforms. In this section we shall describe two 
scenarios where this holds. A third scenario, which is described in Theorem 6.2.9, 
will have to wait until we discuss the FT of signals that are energy-limited but not 
integrable. 
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To see how the next proposition is related to the preservation of the inner product 
under the Fourier Transform, think about g as being a function of frequency and 
of its IFT g as a function of time. 


Proposition 6.2.4. Ifg: fo g(f) andx: t+ a(t) are integrable mappings from R 


to C, then . - 
/ x(t) g(t) at = / a(f) 9°(f) af, (6.8) 
- Ce ae ee 6.9) 


Proof. The key to the proof is to use Fubini’s Theorem to justify changing the 
order of integration in the following calculation: 


he x(t) g°(t) dt = [. x(t) ts g(f) e2rft af) re 


=f [ener aga 
=f nf wer aray 


where the first equality follows from the definition of g; the second because the 
conjugation of an integral is accomplished by conjugating the integrand (Proposi- 
tion 2.3.1); the third by changing the order of integration; and the final equality 
by the definition of the FT of x. 


A related result is that the convolution of an integrable function with the IFT of 
an integrable function is always defined: 


Proposition 6.2.5. If the mappings x: t+ a(t) andg: f > g(f) from R to C are 
both integrable, then the convolution x x & is defined at every epoch t € R and 


(cra) = fo neneras, ter. (6.10) 
Proof. Here too the key is in changing the order of integration: 
(x+e)(t) =f a(r)a(e— nar 
7 if © a(r) / © e®I=7) gf) afar 
= i gf) pref a(r)e 277 drdf 
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where the first equality follows from the definition of the convolution; the second 
from the definition of the IFT; the third by changing the order of integration; and 
the final equality by the definition of the FT. The justification of the changing of the 
order of integration can be argued using Fubini’s Theorem because, by assumption, 


both g and x are integrable. 


We next present another useful version of the preservation of inner products under 
the FT. It is useful for functions (of time) that are zero outside some interval 
(—T, T] or for the IFT of functions (of frequency) that are zero outside an interval 


[—W, W]. 


Proposition 6.2.6 (A Mini Parseval Theorem). 


(i) Let the signals x; and x2 be given by 


wult) =f gfe ag, (feR, v=1,2), 


—co 


where the functions g,: f > gi(f) satisfy 


gf) =0, (Ifl>W, v=1,2), 


for some W => 0, and 


Then 
(x1, X2) = (81, 82) - 
(ti) Let gi and go be given by 
w(f=f seman (eR, v=1,2), 


—Co 


where the signals x1,xX2 € Le are such that for some T> 0 
2,(t) = 0, (lel Spe #2): 


Then 
(X1,X2) — (21, 82) - 


(6.11a) 


(6.11b) 


(6.11c) 


(6.11d) 


(6.12a) 


(6.12b) 


(6.12c) 


Proof. See the proof of Lemma A.3.6 on Page 693 and its corollary in the appendix. 
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6.2.3. The Lo-Fourier Transform 


To appreciate some of the mathematical subtleties of this section, the reader is 
encouraged to review Section 4.7 in order to recall the difference between the 
space Dy, and the space £y and in order to recall the difference between an energy- 
limited signal x € £y and the equivalence class [x] € Lg to which it belongs. In this 
section we shall sketch how the Fourier Transform is defined for elements of Lo. 
This section can be skipped provided that you are willing to take on faith that 
such a transform exists and that, very roughly speaking, it has some of the same 
properties of the Fourier Transform of Definition 6.2.1. To differentiate between 
the transform of Definition 6.2.1 and the transform that we are about to define 
for elements of L2, we shall refer in this section to the former as the £,;-Fourier 
Transform and to the latter as the L2-Fourier Transform. Both will be denoted 
by a “hat.” In subsequent sections the Fourier Transform will be understood to be 
the £,-Fourier Transform unless explicitly otherwise specified. 


Some readers may have already encountered the Dg-Fourier Transform without 
even being aware of it. For example, the sinc(-) function, which is defined in (5.20), 
is an energy-limited signal that is not integrable. Consequently, its £,-Fourier 
Transform is undefined. Nevertheless, you may have seen its Fourier Transform 
being given as the Brickwall function. As we shall see, this is somewhat in line 
with how the Lg-Fourier Transform of the sinc(-) is defined.2 For more on the 
Fourier Transform of the sinc(-) see Section 6.2.5. Another example of an energy- 
limited signal that is not integrable is t > 1/(1 + |¢|). 


We next sketch how the L»-Fourier Transform is defined and explore some of its 
key properties. We begin with the bad news. 


(i) There is no explicit simple expression for the L2-Fourier Transform. 


(ii) The result of applying the transform is not a function but an equivalence 
class of functions. 


The Lg-Fourier Transform is a mapping 
- Lo —? Lo» 


that maps elements of [L» to elements of Ly. It thus maps equivalence classes 
to equivalence classes, not functions. As long as the operation we perform on 
the result of the L2-Fourier Transform does not depend on which member of the 
equivalence class it is performed on, there is no need to worry about this issue. 
Otherwise, we can end up performing operations that are ill-defined. For example, 
an operation that is ill-defined is evaluating the result of the transform at a given 
frequency, say at f = 17. 


An operation you cannot go wrong with is integration, because the integrals of 
two functions that differ on a set of measure zero are equal; see Proposition 2.5.3. 
Consequently, inner products, which are defined via integration, are fine too. In 


3 However, as we shall see, the result of the L2-Fourier Transform is an element of La, i.e., an 
equivalence class, and not a function. 
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this book we shall therefore refrain from applying to the result of the L2-Fourier 
Transform any operation other than integration (or related operations such as the 
computation of energy or inner product). In fact, since we find the notion of 
equivalence classes somewhat abstract we shall try to minimize its use. 


Suppose that x € £2 is an energy-limited signal and that [x] € Lg is its equivalence 
class. How do we define the Lg-Fourier Transform of [x]? We first define for every 
positive integer n the time-truncated function 

Xn: tr a(t) T{|t| <n} 


and note that, by Proposition 3.4.3, x, is integrable. Consequently, its £,-Fourier 
Transform xX,, is well-defined and is given by 


in(f) = he a(t)e "Ft dt, feER. 


We then note that ||x —x,,|| tends to zero as n tends to infinity, so for every € > 0 
there exists some L(e) sufficiently large so that 


|Xn— alle <e,; nym > Le). (6.13) 


Applying Proposition 6.2.6 (ii) with the substitution of max{n,m} for T and of 
Xn — Xm for both x; and xe, we obtain that (6.13) implies 


|X — Klee, nyne > Le). (6.14) 


Because the space of energy-limited signals is complete in the sense of Theo- 
rem 8.5.1 ahead, we may infer from (6.14) that there exists some function ¢ € Le 
such that ||X, — ¢||, converges to zero.4 We then define the L2-Fourier Transform 
of the equivalence class [x] to be the equivalence class [¢]. In view of Footnote 4 
we can define the L»-Fourier Transform as follows. 


Definition 6.2.7 (L2-Fourier Transform). The Lo-Fourier Transform of the 


2 
ao} 


The main properties of the L2-Fourier Transform are summarized in the following 
theorem. 


equivalence class [x] € L2 is denoted by [x] and is given by 


xj 4 {« ariel. ain -[ eG yec Cds 


n—Co 
—co —n 


Theorem 6.2.8 (Properties of the L2-Fourier Transform). The L2-Fourier Trans- 
form is a mapping from Lg onto Lg with the following properties: 


(i) Ifx €LeNL1, then the Lg-Fourier Transform of [x] is the equivalence class 
of the mapping 


fr / a(t) e—27Ft dt. 


“The function ¢ is not unique. If ||xn — ¢||, — 0, then also ||xn - él. — 0 whenever ¢ € [¢]. 
And conversely, if ||xn — ¢||. — 0 and [xn - Cll. — 0, then é must be in [¢]. 
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(ti) The Lg-Fourier Transform is linear in the sense that 


———~ —_—_ —_ 


a[xy] + B[x2] = a[x,] + [xa], (x1.x2 EL, a, BE c). 


(itt) The Lg-Fourier Transform is invertible in the sense that to each [g] € Lg 
there corresponds a unique equivalence class in Lg whose L2-Fourier Trans- 
form is [|g]. This equivalence class can be obtained by reflecting each of the 
elements of [g] to obtain the equivalence class [g| of &, and by then applying 
the Lg-Fourier Transform to it. The result [gl then satisfies 


— 
—~= 


[é] =Ig], ge Le. (6.15) 


(iv) Applying the L2-Fourier Transform twice is equivalent to reflecting the ele- 
ments of the equivalence class 


[x] = [x], x€ Lo. (6.16) 
(v) The Lg-Fourier Transform preserves energies: 


jl, -Iel,, «eee on 

(vi) The L2-Fourier Transform preserves inner products:® 
(x, iv]) = (Bx 71), xy € Le. (6.18) 
Proof. This theorem is a restatement of (Rudin, 1974, Chapter 9, Theorem 9.13). 


Identity (6.16) appears in this form in (Stein and Weiss, 1990, Chapter 1, Section 2, 
Theorem 2.4). 


The result that the D2-Fourier Transform preserves energies is sometimes called 
Plancherel’s Theorem and the result that it preserves inner products Parseval’s 
Theorem. We shall use “Parseval’s Theorem” for both. It is so important that 
we repeat it here in the form of a theorem. Following mathematical practice, we 
drop the square brackets in the theorem’s statement. 


Theorem 6.2.9 (Parseval’s Theorem). For any x,y € Ls 


(x,y) = (%y) (6.19) 


and 


IIxllo = I|llo- (6.20) 


5'The energy of an equivalence class was defined in Section 4.7. 
6The inner product between equivalence classes was defined in Section 4.7. 


6.2 Review of the Fourier Transform 73 


As we mentioned earlier, there is no simple explicit expression for the L2-Fourier 
Transform. The following proposition simplifies its calculation under certain as- 
sumptions that are, for example, satisfied by the sinc(-) function. 


Proposition 6.2.10. [fx = & for some g € Ly NLa, then: 
(i) xE€ Lo. 
(ti) |Ixllo = IIglle- 


(itt) The Lg-Fourier Transform of [x] is the equivalence class |g]. 


Proof. It suffices to prove Part (iii) because Parts (i) and (ii) will then follow from 
the preservation of energy under the Lg-Fourier Transform (Theorem 6.2.8 (v)). 
To prove Part (iii) we compute 


where the first equality follows from (6.15); the second from Theorem 6.2.8 (i) 
(because the hypothesis g € £L;M Leg implies that § € L;, Le); and the final 
equality from Proposition 6.2.3 (i) and from the hypothesis that x = g. 


6.2.4 More on the Fourier Transform 


In this section we present additional results that shed some light on the problem of 
reconstructing a signal from its FT. The first is a continuity result, which may seem 
technical but which has some useful consequences. It can be used to show that the 
IFT (of an integrable function) always yields a continuous signal. Consequently, 
if one starts with a discontinuous function, takes its FT, and then the IFT, one 
does not obtain the original function. It can also be used—once we define the 
frequency response of a filter in Section 6.3—to show that no stable filter can have 
a discontinuous frequency response. 


Theorem 6.2.11 (Continuity and Boundedness of the Fourier Transform). 


(i) If x is integrable, then its FT & is a uniformly continuous function satisfying 


apl< [lela FER, (6.21) 
and 
lim #(f) =0. (6.22) 
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(tt) If g is integrable, then its IFT & is a uniformly continuous function satisfying 


ols f lolar, ter (6.23) 
Proof. We begin with Part (i). Inequality (6.21) follows directly from the definition 
of the FT and from Proposition 2.4.1. The proof of the uniform continuity of x is 
not very difficult but is omitted. See (Katznelson, 1976, Section VI.1, Theorem 1.2). 
A proof of (6.22) can be found in (Katznelson, 1976, Section VI.1, Theorem 1.7). 
Part (ii) follows by substituting & for x in Part (i) because the IFT of g is the FT 
of its mirror image (6.5). 


The second result we present is that every integrable signal can be reconstructed 
from its FT, but not necessarily via the IFT. The reconstruction formula in (6.25) 
ahead works even when the IFT does not do the job. 


Theorem 6.2.12 (Reconstructing a Signal from Its Fourier Transform). 
(i) If two integrable signals have the same FT, then they are indistinguishable: 
(af) =42(f), FER) + (1 =x2), xx € £1. (6.24) 


(ti) Every integrable function x can be reconstructed from its FT in the sense that 


CO 
lim 
Aco J 


x(t) — ye (1 = I) 4) canst ay| dt=0. (6.25) 


—Xr 


Co 


Proof. See (Katznelson, 1976, Section VI.1.10). 


Conditions under which the IFT of the FT of a signal recovers the signal are given 
in the following theorem. 


Theorem 6.2.13 (The Inversion Theorem). 


(i) Suppose that x is integrable and that its FT x is also integrable. Define 


K=x. (6.26) 
Then x is a continuous function with 
lim z(t) = 0, (6.27) 


|t|+00 
and the functions x and X agree except on a set of Lebesgue measure zero. 


(ii) Suppose that g is integrable and that its IFT & is also integrable. Define 


g=g. (6.28) 
Then g is a continuous function with 
aon g(f) =0 (6.29) 


and the functions g and g agree except on a set of Lebesgue measure zero. 
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Proof. For a proof of Part (i) see (Rudin, 1974, Theorem 9.11). Part (ii) follows 
by substituting g for x in Part (i) and using Proposition 6.2.3 (iii). 


Corollary 6.2.14. 


(i) If x is a continuous integrable signal whose FT is integrable, then 

Xx =x. (6.30) 
(wi) If g is continuous and integrable, and if & is also integrable, then 

g=8. (6.31) 


Proof. Part (i) follows from Theorem 6.2.13 (i) by noting that if two continuous 
functions are equal outside a set of Lebesgue measure zero, then they are identical. 
Part (ii) follows similarly from Theorem 6.2.13 (ii). 


6.2.5 On the Brickwall and the sinc(-) Functions 
We next discuss the FT and the IFT of the Brickwall function 


fe Tile < UY, (6.32) 


which derives its name from the shape of its plot. Since it is a symmetric function, 
it follows from (6.6) that its FT and IFT are identical. Both are equal to a properly 
stretched and scaled sinc(-) function (5.20). 


More generally, we offer the reader advice on how to remember that for a,y > 0, 


tt 6 sinc(at) is the IFT of fr BI{|f| < y} (6.33) 
if, and only if, 
6 = 278 (6.34a) 
and ae 
1 = 5 (6.34b) 


Condition (6.34a) is easily remembered because its LHS is the value at t = 0 of 
6 sinc(at) and its RHS is the value at t = 0 of the IFT of fr GBI{|f| < 7}: 


jf aulistis ner as 


=f sulsis vas = 276. 
t=0 00 


Condition (6.34b) is intimately related to the Sampling Theorem that you may 
have already seen and that we shall discuss in Chapter 8. Indeed, in the Sam- 
pling Theorem (Theorem 8.4.3) the time between consecutive samples T and the 


bandwidth W satisfy 


1 
TW=-. 
2 


(In this application a corresponds to 1/T and y corresponds to the bandwidth W.) 
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first zero at + 


A 


cutoff + 


Figure 6.1: The stretched & scaled sinc(-) function and the stretched & scaled 
Brickwall function above are an Lg Fourier pair if the value of the former at zero 
(i.e., 6) is the integral of the latter (i.e., 2 x 6 x cutoff) and if the product of the 
location of the first zero of the former by the cutoff of the latter is 1/2. 


It is tempting to say that Conditions (6.34) also imply that the FT of the func- 
tion t +> dsinc(at) is the function f > GI{|f| < y}, but there is a caveat. The 
signal t +> dsinc(at) is not integrable. Consequently, its £1-Fourier Transform 
(Definition 6.2.1) is undefined. However, since it is energy-limited, its L2-Fourier 
Transform is defined (Definition 6.2.7). Using Proposition 6.2.10 with the substitu- 
tion of f > GI{|f| < y} for g, we obtain that, indeed, Conditions (6.34) imply that 
the L,-Fourier Transform of the (equivalence class of the) function t + 6 sinc(at) 
is the (equivalence class of the) function f > GI{|f| < y}. 

The relation between the sinc(-) and the Brickwall functions is summarized in 
Figure 6.1. 


The derivation of the result is straightforward: the IFT of the Brickwall function 
can be computed as 


/ BI{|f| <a} er!" af=0 | cenft af 
sr ah 


B cianft 7 
i2nt 
: B (erry _ eaanre) 
i2nt 


=m 


Z sin(2777t) 


= 26ysinc(2t). (6.35) 
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6.3 The Frequency Response of a Filter 


Recall that in Section 5.7 we defined a filter of impulse response h to be a physical 
device that when fed the input x produces the output x xh. Of course, this is only 
meaningful if the convolution is defined. Subject to some technical assumptions 
that are made precise in Theorem 6.3.2, the FT of the output waveform x xh is the 
product of the FT of the input waveform x by the FT of the impulse response h. 
Consequently, we can think of a filter of impulse response h as a physical device 
that produces an output signal whose FT is the product of the FT of the input 
signal and the FT of the impulse response. 


The FT of the impulse response is called the frequency response of the filter. If 
the filter is stable and its impulse response therefore integrable, then we define the 
filter’s frequency response as the Fourier Transform of the impulse response using 
Definition 6.2.1 (the £,-Fourier Transform). If the impulse response is energy- 
limited but not integrable, then we define the frequency response as the Fourier 
Transform of the impulse response using the definition of the Fourier Transform for 
energy-limited signals that are not integrable as in Section 6.2.3 (the L2-Fourier 
Transform). 


Definition 6.3.1 (Frequency Response). 


(i) The frequency response of a stable filter is the Fourier Transform of its 
impulse response as defined in Definition 6.2.1. 


(i) The frequency response of an unstable filter whose impulse response is 
energy-limited is the Lg-Fourier Transform of its impulse response as defined 
in Section 6.2.8. 


As discussed in Section 5.5, if x,h are both integrable, then x xh is defined at 
all epochs ¢ outside a set of Lebesgue measure zero, and x xh is integrable. In 
this case the FT of x *h is the mapping f + #(f)A(f). If x is integrable and 
h is of finite energy, then x x h is also defined at all epochs t outside a set of 
Lebesgue measure zero. But in this case the convolution is only guaranteed to be 
of finite energy; it need not be integrable. We can discuss its Fourier Transform 
using the definition of the L2-Fourier Transform for energy-limited signals that are 
not integrable as in Section 6.2.3. In this case, again, the L2-Fourier Transform of 
x *h is the (equivalence class of the) mapping f + #(f) h(f):7 


Theorem 6.3.2 (The Fourier Transform of a Convolution). 
(i) If the signals h and x are both integrable, then the convolution xxh is defined 


for all t outside a set of Lebesgue measure zero; it is integrable; and its 
£L,-Fourier Transform x xh is given by 


_—__ 


x*h(f)=a(f)a(f), feER, (6.36) 


’To be precise we should say that the L2-Fourier Transform of x*h is the equivalence class of 
the product of the £;-Fourier Transform of x by any element in the equivalence class consisting 
of the Lg-Fourier Transform of [hl]. 
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LPF w. (f) 
A 
W. 
r= 
1 
= a > f 
—W. W. 


Figure 6.2: The frequency response of the ideal unit-gain lowpass filter of cutoff 
frequency W.. Notice that W, is the length of the interval of positive frequencies 
where the gain is one. 


where & and h are the L1-Fourier Transforms of x and h. 


(it) If the signal x is integrable and if h is of finite energy, then the convolution 
xxh is defined for allt outside a set of Lebesgue measure zero; it is energy- 
limited; and its Lg-Fourier Transform xxh is also given by (6.36) with x, 
as before, being the £L1-Fourier Transform of x but with h now being the 
L2-Fourier Transform of h. 


Proof. For a proof of Part (i) see, for example, (Stein and Weiss, 1990, Chapter 1, 
Section 1, Theorem 1.4). For Part (ii) see (Stein and Weiss, 1990, Chapter 1, 
Section 2, Theorem 2.6). 


As an example, recall from Section 5.9 that the unit-gain ideal lowpass filter of 
cutoff frequency W, is a filter of impulse response 


A(t) = 2We sinc(2Wet), tER. (6.37) 


This filter is not causal and not stable, but its impulse response is energy-limited. 
The filter’s frequency response is the L2-Fourier Transform of the impulse response 
(6.37), which, using the results from Section 6.2.5, is given by (the equivalence class 
of) the mapping 

follfl<We}, fer. (6.38) 


This mapping maps all frequencies f satisfying |f| > W-. to 0 and all frequencies 
satisfying | f| << W. to one. It is for this reason that we use the adjective “unit-gain” 
in describing this filter. We denote the mapping in (6.38) by LPF w,(-) so 


LPFw,(f) £U{|f|<We}, feR. (6.39) 


This mapping is depicted in Figure 6.2. Note that W, is the length of the interval 
of positive frequencies where the response is one. 


Turning to the ideal unit-gain bandpass filter of bandwidth W around the carrier 
frequency f. satisfying f. > W/2, we note that, by (5.21), its time-t impulse 
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BPFw,.(f) 
A 
~ We > 
fF LE 
> f 
—fe fe 


Figure 6.3: The frequency response of the ideal unit-gain bandpass filter of band- 
width W around the carrier frequency f.. Notice that, as for the lowpass filter, W 
is the length of the interval of positive frequencies where the gain is one. 


response BPF w, ,(t) is given by 
BPFw, .(t) = 2Wcos(2z ft) sinc(Wt) 
= 2Re (LPFw/a(?) as (6.40) 


This filter too is noncausal and nonstable. From (6.40) and (6.39) we obtain using 
Table 6.1 that its frequency response is (the equivalence class of) the mapping 


por|l-£l< Sh. 


We denote this mapping by BPFw, f.(-) 80 


—- W 
BPFw,-.(f) = I{ II fe| < 5 i} feR. (6.41) 
This mapping is depicted in Figure 6.3. Note that, as for the lowpass filter, W is 


the length of the interval of positive frequencies where the response is one. 


6.4 Bandlimited Signals and Lowpass Filtering 


In this section we define bandlimited signals and discuss lowpass filtering. We 

treat energy-limited signals and integrable signals separately. As we shall see, any 

integrable signal that is bandlimited to W Hz is also an energy-limited signal that 

is bandlimited to W Hz (Note 6.4.12). 

6.4.1 Energy-Limited Signals 

The main result of this section is that the following three statements are equivalent: 
(a) The signal x is an energy-limited signal satisfying 


(x«LPFw)(t)=2(t), teR. (6.42) 
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(b) The signal x can be expressed in the form 


WwW 
a(t) = . wi fye®"F* df, ER, (6.43a) 


for some measurable function g: f +> g(f) satisfying 


W 
i, I9(f)|? df < 00. (6.43b) 
—Ww 


(c) The signal x is a continuous energy-limited signal whose L2-Fourier Trans- 
form xX satisfies 


lee) WwW 
ip (NP af = / _|ePae (6.44) 


We can thus define x to be an energy-limited signal that is bandlimited to W Hz 
if one (and hence all) of the above conditions hold. 

In deriving this result we shall take (a) as the definition. We shall then establish 
the equivalence (a) = (b) in Proposition 6.4.5, which also establishes that the 
function g in (6.43a) can be taken as any element in the equivalence class of the 
L2-Fourier Transform of x, and that the LHS of (6.43b) is then IIx||3- Finally, we 
shall establish the equivalence (a) = (c) in Proposition 6.4.6. 


We conclude the section with a summary of the key properties of the result of 
passing an energy-limited signal through an ideal unit-gain lowpass filter. 


We begin by defining an energy-limited signal to be bandlimited to W Hz if it is 
unaltered when it is lowpass filtered by an ideal unit-gain lowpass filter of cutoff 
frequency W. Recalling that we are denoting by LPFw/(t) the time-t impulse 
response of an ideal unit-gain lowpass filter of cutoff frequency W (see (5.19)), we 
have the following definition.® 


Definition 6.4.1 (Energy-Limited Bandlimited Signals). We say that the signal x 
is an energy-limited signal that is bandlimited to W Hz if x is in Le and 


(x*LPFy)(t)=2(t), téR. (6.45) 


Note 6.4.2. If an energy-limited signal that is bandlimited to W Hz is of zero 
energy, then it is the all-zero signal 0. 


Proof. Let x be an energy-limited signal that is bandlimited to W Hz and that 
has zero energy. Then 
jx(t)| = | (x * LPF) (t)| 
< ||xllp |LPFwlle 
= ||xI|, V2W 
=0, teER, 


8Even though the ideal unit-gain lowpass filter of cutoff frequency W is not stable, its impulse 
response LPF y/(-) is of finite energy (because it decays like 1/t and the integral of 1/t? from one 
to infinity is finite). Consequently, we can use the Cauchy-Schwarz Inequality to prove that if 
x € Ly then the mapping 7 + 2(7T)LPFw/(t — 7) is integrable for every time instant ¢ € R. 
Consequently, the convolution x « LPF yw is defined at every time instant t; see Section 5.5. 
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where the first equality follows because x is an energy-limited signal that is band- 
limited to W Hz and is thus unaltered when it is lowpass filtered; the subsequent 
inequality follows from (5.6b); the subsequent equality by computing ||LPF ||, 
using Parseval’s Theorem and the explicit form of the frequency response of the 
ideal unit-gain lowpass filter of bandwidth W (6.38); and where the final equality 
follows from the hypothesis that x is of zero energy. 


Having defined what it means for an energy-limited signal to be bandlimited to W 
Hz, we can now define its bandwidth.® 


Definition 6.4.3 (Bandwidth). The bandwidth of an energy-limited signal x is 
the smallest frequency W to which x is bandlimited. 


The next lemma shows that the result of passing an energy-limited signal through 
an ideal unit-gain lowpass filter of cutoff frequency W is an energy-limited signal 
that is bandlimited to W Hz. 


Lemma 6.4.4. 


(i) Let y =xxLPFw be the output of an ideal unit-gain lowpass filter of cutoff 
frequency W that is fed the energy-limited input x € Lo. Theny € Lo; 


WwW 
y(t) = [a e@"tt df, teR; (6.46) 


and the L2-Fourier Transform of y is the (equivalence class of the) mapping 
fro af) itlfl < Wh. (6.47) 


(i) If g: f + g(f) is a bounded integrable function and if x is energy-limited, 
then x x & is in Lo; it can be expressed as 


(x*8)(t) = ‘e a(flg(fyeP™* df, teER; (6.48) 


—co 


and its L2-Fourier Transform is given by (the equivalence class of) the map- 


ping fro &(f) 9(f). 


Proof. Even though Part (i) is a special case of Part (ii) corresponding to g being 
the mapping f +> I{|f| < W}, we shall prove the two parts separately. We begin 
with a proof of Part (i). The idea of the proof is to express for each t € R the 
time-t output y(t) as an inner product and to then use Parseval’s Theorem. Thus, 


°To be more rigorous we should use in this definition the term “infimum” instead of “smallest,” 
but it turns out that the infimum here is also a minimum. 
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(6.46) follows from the calculation 


y(t) = (xx LPFw)(t) 
= / x(r) LPFw(t — 7) dr 


- ae +> LPFw(t — 7)) 
= (x, 7T+> LPFw(r — t)) 


= (&, fre?" EPP y/(f)) 
= (&, firs eP™FtT{| f| < WH) 


Ww i, . 
=f sendy 
—w 


where the fourth equality follows from the symmetry of the function LPFw/(-), and 
where the fifth equality follows from Parseval’s Theorem and the fact that delaying 
a function multiplies its FT by a complex exponential. Having established (6.46), 
Part (i) now follows from Proposition 6.2.10, because, by Parseval’s Theorem, the 
mapping f +> &(f)I{|f| < W} is of finite energy and hence, by Proposition 3.4.3, 
also integrable. 

We next turn to Part (ii). We first note that the assumption that g is bounded 


and integrable implies that it is also energy-limited, because if |g(f)| < oo. for all 
f ER, then |g(f)? < colg(f)| and flg(f)P? df < a0 flg(f)| df. Thus, 


geElyzNLo. (6.49) 


We next prove (6.48). To that end we express the convolution x x g at time ¢ as 
an inner product and then use Parseval’s Theorem to obtain 


(x * 8) (t) = if u(r) g(t — rT) dr 


—Co 


= (7 F(E-7) 
= (% fire Pri g*(f)) 
=f apaferas, ter, (6.50) 


—co 


where the third equality follows from Parseval’s Theorem and by noting that the 
L2-Fourier Transform of the mapping rt +> g*(t — 7) is the equivalence class of 


the mapping f + e—?"/* g*(f), as can be verified by expressing the mapping 


TH g*(t— 7) as the IFT of the mapping f + e~?7ft g*(f) 


i (t-7) = ( / : gf) eee af) 


=f sinemte as 
i 


(s() eu) et af treR, 
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and by then applying Proposition 6.2.10 to the mapping f ++ g*(f)e~?"/*, which 
is in £L,M Le by (6.49). 


Having established (6.48) we next examine the integrand in (6.48) and note that 
if |g(f)| is upper-bounded by o, then the modulus of the integrand is upper- 
bounded by o|&(f)|, so the assumption that x € £2 (and hence that x is of finite 
energy) guarantees that the integrand is square integrable. Also, by the Cauchy- 
Schwarz Inequality, the square integrability of g and of x implies that the integrand 
is integrable. Thus, the integrand is both square integrable and integrable so, by 
Proposition 6.2.10, the signal x * & is square integrable and its Fourier Transform 
is the (equivalence class of the) mapping f +> #(f) g(f). 


With the aid of the above lemma we can now give an equivalent definition for 
energy-limited signals that are bandlimited to W Hz. This definition is popular 
among mathematicians, because it does not involve the L2-Fourier Transform and 
because the continuity of the signal is implied. 


Proposition 6.4.5 (On the Definition of Bandlimited Functions in £2). 


(i) If x is an energy-limited signal that is bandlimited to W Hz, then it can be 
expressed in the form 


W 
x(t) = a sheath, teR, (6.51) 


where g(-) satisfies 
w 
/ l9(f)|? df < 00 (6.52) 
—w 


and can be taken as (any function in the equivalence class of) x. 


(ii) If a signal x can be expressed as in (6.51) for some function g(-) satisfying 
(6.52), then x is an energy-limited signal that is bandlimited to W Hz and x 
is (the equivalence class of) the mapping f > g(f)I{|f| < W}. 


Proof. We first prove Part (i). Let x be an energy-limited signal that is band- 
limited to W Hz. Then 


a(t) = (x * LPFw)(t) 


Ww . 
=| a(fye®"ft df, teR, 


—w 
where the first equality follows from Definition 6.4.1, and where the second equality 
follows from Lemma 6.4.4 (i). Consequently, if we pick g as (any element of the 


equivalence class of) f > &(f)I{|f| < W}, then (6.51) will be satisfied and (6.52) 
will follow from Parseval’s Theorem. 


To prove Part (ii) define g: ft g(f) I{|f| < W}. From the assumption (6.52) and 


from Proposition 3.4.3 it then follows that g € £;MLg. This and (6.51) imply that 
x € Le and that the Lg-Fourier Transform of (the equivalence class of) x is (the 
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equivalence class of) g; see Proposition 6.2.10. To complete the proof of Part (ii) 
it thus remains to show that x x LPFw = x. This follows from the calculation: 


WwW 


(x * LPFw)(t) = i &(f)e?™Ft af 


—W 


Ww 
=| afer as 


—W 
= x(t), te R, 


where the first equality follows from Lemma 6.4.4 (i); the second because we have 
already established that the L»2-Fourier Transform of (the equivalence class of) x is 
(the equivalence class of) f + g(f)I{|f| < W}; and where the last equality follows 
from (6.51). 


In the engineering literature a function is often defined as bandlimited to W Hz 
if its FT is zero for frequencies f outside the interval [-W,W]. This definition 
is imprecise because the L2-Fourier Transform of a signal is an equivalence class 
and its value at a given frequency is technically undefined. It would be better to 
define an energy-limited signal as bandlimited to W Hz if Il><1)3 = ae a(f)|’ df 
so “all its energy is contained in the frequency band [—W, W].” However, this is 
not quite equivalent to our definition. For example, the [L2-Fourier Transform of 
the discontinuous signal 


sinc 2Wt otherwise, 


1 if t= 
= 1m if t=0, 


is (the equivalence class of) the Brickwall (frequency domain) function 


syHifl<wh fer 


(because the discontinuity at ¢ = 0 does not influence the Fourier integral), but 
the signal is altered by the lowpass filter, which smooths it out to produce the 
continuous waveform t +> sinc(2Wt). Readers who have already seen the Sampling 
Theorem will note that the above signal x(-) provides a counterexample to the 
Sampling Theorem as it is often imprecisely stated. 


The following proposition clarifies the relationship between this definition and ours. 


Proposition 6.4.6 (More on the Definition of Bandlimited Functions in £2). 


(i) If x is an energy-limited signal that is bandlimited to W Hz, then x is a 
continuous function and all its energy is contained in the frequency interval 
[—W, W] in the sense that its Ly-Fourier Transform X satisfies 


lee) WwW 
J, e(NP af = 3 pad, (6.53) 
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(ii) If the signal x € Lo satisfies (6.53), then x is indistinguishable from the 
signal x x LPFw, which is an energy-limited signal that is bandlimited to W 
Hz. If in addition to satisfying (6.53) the signal x is continuous, then x is 
an energy-limited signal that is bandlimited to W Hz. 


Proof. This proposition’s claims are a subset of those of Proposition 6.4.7, which 
summarizes some of the results relating to lowpass filtering. The proof is therefore 
omitted. 


Proposition 6.4.7. Let y =xxLPFw be the result of feeding the signal x € Lg to 
an ideal unit-gain lowpass filter of cutoff frequency W. Then: 


(i) y is energy-limited with 
II¥lo < IIxlle- (6.54) 


(it) y is an energy-limited signal that is bandlimited to W Hz. 


(itt) Its Lg-Fourier Transform ¥ is given by (the equivalence class of) the mapping 
fr &(f)I{|f] < W}- 


(iv) All the energy in y is concentrated in the frequency band [-W,W] in the 
sense that: 


Co WwW 
i, (AP af = / _lapPas, 


(v) y can be represented as 


y= fa Nervas, teR (6.55) 


&(f)e?"F* df, tER. (6.56) 


(vi) y is uniformly continuous. 


(vii) Ifx € Lo has all its energy concentrated in the frequency band [—-W,W] in 
the sense that 


Co WwW 
i a(NP af = / _ liad, (6.57) 


then x is indistinguishable from the bandlimited signal x x LPFw. 


(viti) x is an energy-limited signal that is bandlimited to W if, and only if, it 
satisfies all three of the following conditions: it is in Lo; it is continuous; 
and it satisfies (6.57). 
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Proof. Part (i) follows from Lemma 6.4.4 (i), which demonstrates that y is (the 
equivalence class of) the mapping f + &(f) I{|f| < W} so, by Parseval’s Theorem, 


vig= f wnray 
a 
= / e(APaf 


< . e(NP af 


2 
= |[xll2 - 


Part (ii) follows because, by Lemma 6.4.4 (i), the signal y satisfies 


W 
y(t) = i a(f) ent af 


—W 


where 
W . love) 7 i 
/ (AP af < / a(f)P df = [Ixll2 < 00, 
—W —oo 


so, by Proposition 6.4.5, y is an energy-limited signal that is bandlimited to W Hz. 


Part (iii) follows directly from Lemma 6.4.4 (i). Part (iv) follows from Part (iii). 
Part (v) follows, again, directly from Lemma 6.4.4. 
Part (vi) follows from the representation (6.56); from the fact that the IFT of 
integrable functions is uniformly continuous (Theorem 6.2.11); and because the 
condition ||x||, < oo implies, by Proposition 3.4.3, that f > 2(f)I{|f| < W} is 
integrable. 
To prove Part (vii) we note that by Part (ii) x x LPF yw is an energy-limited signal 
that is bandlimited to W Hz, and we note that (6.57) implies that x is indistin- 
guishable from x * LPF w because 

oo aa 2 

IIx — x* LPFw||2 = | ep z= x*LPFw(f)| 


df 


=f |e aii swiP ar 


where the first equality follows from Parseval’s Theorem; the second equality from 
Lemma 6.4.4 (i); the third equality because the integrand is zero for | f| < W; and 
the final equality from (6.57). 


To prove Part (viii) define y = x x LPFw and note that if x is an energy-limited 
signal that is bandlimited to W Hz then, by Definition 6.4.1, y = x so the continuity 
of x and the fact that its energy is concentrated in the interval [—W, W] follow from 
Parts (iv) and (vi). In the other direction, if x satisfies (6.57) then by Part (vii) 
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it is indistinguishable from the signal y, which is continuous by Part (vi). If, 
additionally, x is continuous, then x must be identical to y because two continuous 
functions that are indistinguishable must be identical. 


6.4.2 Integrable Signals 


We next discuss what we mean when we say that x is an integrable signal that is 
bandlimited to W Hz. Also important will be Note 6.4.11, which establishes that 
if x is such a signal, then x is equal to the IFT of its FT. 


Even though the ideal unit-gain lowpass filter is unstable, its convolution with any 
integrable signal is well-defined. Denoting the cutoff frequency by W. we have: 


Proposition 6.4.8. For any x € Ly the convolution integral 


[. u(r) LPFw, (t — 7) dr 


—oco 


is defined at every epoch t € R and is given by 


oo W. 
i z(T) LPFw,(t — 7) dr = / &(fye®"Fi df, teR. (6.58) 


—oo —W. 


Moreover, x x LPFw, is an energy-limited function that is bandlimited to W,. Hz. 
Its Lg-Fourier Transform is (the equivalence class of) the mapping 


fro af) Itlfl < Wet. 


Proof. The key to the proof is to note that, although the sinc(-) function is not 
integrable, it follows from (6.35) that it can be represented as the Inverse Fourier 
Transform of an integrable function (of frequency). Consequently, the existence 
of the convolution and its representation as (6.58) follow directly from Proposi- 
tion 6.2.5 and (6.35). 


To prove the remaining assertions of the proposition we note that, since x is inte- 
grable, it follows from Theorem 6.2.11 that |@(f)| < ||x||, and hence 


W. 
[. wenras <0. (6.59) 
_w. 


c 


The result now follows from (6.58), (6.59), and Proposition 6.4.5. 


With the aid of Proposition 6.4.8 we can now define bandlimited integrable signals: 


Definition 6.4.9 (Bandlimited Integrable Signals). We say that the signal x is 
an integrable signal that is bandlimited to W Hz if x is integrable and if tt 
is unaltered when it is lowpass filtered by an ideal unit-gain lowpass filter of cutoff 
frequency W: 

a(t) =(x*LPFw)(t), teER. 
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Proposition 6.4.10 (Characterizing Integrable Signals that Are Bandlimited to 
W Hz). If x is an integrable signal, then each of the following statements is equiv- 
alent to the statement that x is an integrable signal that is bandlimited to W Hz: 


(a) The signal x is unaltered when it is lowpass filtered: 


a(t) = (x*LPFw)(t), teéR. (6.60) 


(b) The signal x can be expressed as 


a(t) = ie a(fye2"tt df, teR. (6.61) 


—w 
(c) The signal x is continuous and 


a(f)=0, |fl>W. (6.62) 


(d) There exists an integrable function g such that 
w . 
x(t) =| g(fye’"* df, teR. (6.63) 
-—w 


Proof. Condition (a) is the condition given in Definition 6.4.9, so it only remains 
to show that the four conditions are equivalent. We proceed to do so by proving 
that (a) = (b); that (b) => (d); that (d) = (c); and that (c) > (b). 


That (a) <= (b) follows directly from Proposition 6.4.8 and, more specifically, from 
the representation (6.58). The implication (b) = (d) is obvious because nothing 
precludes us from picking g to be the mapping f + #(f)I{|f| < W}, which is 
integrable because X is bounded by ||x||, (Theorem 6.2.11). 


We next prove that (d) = (c). We thus assume that there exists an integrable 
function g such that (6.63) holds and proceed to prove that x is continuous and 
that (6.62) holds. To that end we first note that the integrability of g implies, 
by Theorem 6.2.11, that x (= g) is continuous. It thus remains to prove that x 
satisfies (6.62). Define go as the mapping f + g(f)I{|f| < W}. By (6.63) it then 
follows that x = & . Consequently, 


0: (6.64) 


(jokes 


x= 


Employing Theorem 6.2.13 (ii) we conclude that the RHS of (6.64) is equal to go 
outside a set of Lebesgue measure zero, so (6.64) implies that x is indistinguishable 
from go. Since both * and go are continuous for |f| > W, this implies that 
&(f) = go(f) for all frequencies |f| > W. Since, by its definition, go(f) = 0 
whenever |f| > W we can conclude that (6.62) holds. 


Finally (c) = (b) follows directly from Theorem 6.2.13 (i). 


From Proposition 6.4.10 (cf. (b) and (c)) we obtain: 


Note 6.4.11. If x is an integrable signal that is bandlimited to W Hz, then it is 
equal to the IFT of its FT. 
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By Proposition 6.4.10 it also follows that if x is an integrable signal that is 
bandlimited to W Hz, then (6.61) is satisfied. Since the integrand in (6.61) is 
bounded (by ||x||,) it follows that the integrand is square integrable over the in- 
terval [-W,W]. Consequently, by Proposition 6.4.5, x must be an energy-limited 
signal that is bandlimited to W Hz. We have thus proved: 


Note 6.4.12. An integrable signal that is bandlimited to W Hz is also an energy- 
limited signal that is bandlimited to W Hz. 


The reverse statement is not true: the sinc(-) is an energy-limited signal that is 
bandlimited to 1/2 Hz, but it is not integrable. 


The definition of bandwidth for integrable signals is similar to Definition 6.4.3.1° 


Definition 6.4.13 (Bandwidth). The bandwidth of an integrable signal is the 
smallest frequency W to which it is bandlimited. 


6.5 Bandlimited Signals Through Stable Filters 


In this section we discuss the result of feeding bandlimited signals to stable filters. 
We begin with energy-limited signals. In Theorem 6.3.2 we saw that the convo- 
lution of an integrable signal with an energy-limited signal is defined at all times 
outside a set of Lebesgue measure zero. The next proposition shows that if the 
energy-limited signal is bandlimited to W Hz, then the convolution is defined at 
every time, and the result is an energy-limited signal that is bandlimited to W Hz. 


Proposition 6.5.1. Let x be an energy-limited signal that is bandlimited to W Hz 
and let h be integrable. Then xxh is defined for every t € R; it is an energy-limited 
signal that is bandlimited to W Hz; and it can be represented as 


Ww A ‘i 
(xn)(t) = a(fyh(fye2"l* af, teR. (6.65) 


—W 


Proof. Since x is an energy-limited signal that is bandlimited to W Hz, it follows 
from Proposition 6.4.5 that 


w 
x(t) = | &(f)e®"Ft df, teR, (6.66) 
—W 

with the mapping f + #(f)I{|f| < W} being square integrable and hence, by 
Proposition 3.4.3, also integrable. Thus the convolution x xh is the convolution 
between the IFT of the integrable mapping f + (f) I{|f| < W} and the integrable 
function h. By Proposition 6.2.5 we thus obtain that the convolution xxh is defined 
at every time t and has the representation (6.65). The proposition will now follow 
from (6.65) and Proposition 6.4.5 once we demonstrate that 


bo 


10 Again, we omit the proof that the infimum is a minimum. 


#(f) ACP) df < 00. 
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This can be proved by upper-bounding |h(f)| by |h||, (Theorem 6.2.11) and by 
then using Parseval’s Theorem. 


We next turn to integrable signals passed through stable filters. 


Proposition 6.5.2 (Integrable Bandlimited Signals through Stable Filters). Let x 
be an integrable signal that is bandlimited to W Hz, and let h be integrable. Then 
the convolution x xh is defined for every t € R; it is an integrable signal that is 
bandlimited to W Hz; and it can be represented as 


WwW A “ 
(<n)(e) = | a(fyh(fye?"F* af, teER. (6.67) 


—W 


Proof. Since every integrable signal that is bandlimited to W Hz is also an energy- 
limited signal that is bandlimited to W Hz, it follows from Proposition 6.5.1 that the 
convolution x xh is defined at every epoch and that it can be represented as (6.65). 
Alternatively, one can derive this representation from (6.61) and Proposition 6.2.5. 
It only remains to show that x x h is integrable, but this follows because the 
convolution of two integrable functions is integrable (5.9). 


6.6 The Bandwidth of a Product of Two Signals 


In this section we discuss the bandwidth of the product of two bandlimited signals. 
The result is a straightforward consequence of the fact that the FT of a product 
of two signals is the convolution of their FTs. We begin with the following result 
on the FT of a product of signals. 


Proposition 6.6.1 (The FT of a Product Is the Convolution of the FTs). Jf x; 


and x2 are energy-limited signals, then their product 
tr x4(t) xo(t) 
is an integrable function whose FT is the mapping 
f > (%1 * 2) (f). 
Proof. Let x; and x2 be energy-limited signals, and denote their product by y: 
y(t) = 21(t) xo(t), teER. 


Since both x; and xg are square integrable, it follows from the Cauchy-Schwarz 
Inequality that their product y is integrable and that 


lIyllz < lille Wlx2lle - (6.68) 


Having established that the product is integrable, we next derive its FT and show 
that 


Of) = (Kix k2)(f), FER. (6.69) 
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This is done by expressing §(f) as an inner product between two finite-energy 
functions and by then using Parseval’s Theorem: 


/ y(t) e ?"F* dt 


—co 


HS) 
= / ” y(t) ma(t) ef dt 


= (th 21(t), tH 3(t) e?7/") 


= (Xi *X2)(f), feR. 


Proposition 6.6.2. Let x; and x2 be energy-limited signals that are bandlimited to 
W, Hz and W2 Hz respectively. Then their product is an energy-limited signal that 
is bandlimited to W, + W2 Hz. 


Proof. We will show that 


Wi+W2 = 
xi(t)ea(t) = f o(fletdf, ter, (6.70) 
—(Wi+Ws2) 
where the function g(-) satisfies 
Wi+W2 ‘5 
/ lo PP Af < oo. (6.71) 
—(Wi+Ws2) 


The result will then follow from Proposition 6.4.5. 


To establish (6.70) we begin by noting that since x, is of finite energy and band- 
limited to W, Hz we have by Proposition 6.4.5 


Wi 


a1(t) =| ai(fiye?""* dfi, teR. 
—-W; 
Similarly, 
W2 . 
x(t) = / £a(fo) el2n fat dfo, teER. 
—W2 
Consequently, 


Wi W> 
x(t) X(t) =| #4 (fy) el? ft anf o( fa) el? Fat dfo 


2 


We W2 . 
~ / i) 41 (fi) B2( fo) eP™ ETP) df, df 


W;, J-We 


= [. ie #1 (fi) @2( fo) el?" +12)" df dfo 
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= - . : (fF) fof — fye2"t afar 


=) em FE (1 4 Ko)(f) AF 
_ [- eft af fydf, teR, (6.72) 
where a 
gt= [alfa faj, fer. (6.73) 


Here the second equality follows from Fubini’s Theorem;!* the third because x, 
and x» are bandlimited to W; and W, Hz respectively; and the fourth by intro- 
ducing the variables f * f; + fo and f = fi. 


To establish (6.70) we now need to show that because x; and x2 are bandlimited 
to W; and W?2 Hz respectively, it follows that 
g(f) = 0, |f| > W, + Wo. (6.74) 


To prove this we note that because x, and x2 are bandlimited to W, Hz and W2 
Hz respectively, we can rewrite (6.73) as 


o= [HMMs Wi} aalt - DUI fis Wada feR, (6.75) 


and the product I{|f| < Wi} I{|f- f\< W;} is zero for all frequencies f satisfying 
|f| >W, + Wo. 

Having established (6.70) using (6.72) and (6.74), we now proceed to prove (6.71) 
by showing that the integrand in (6.71) is bounded. We do so by noting that 
the integrand in (6.71) is the convolution of two square-integrable functions (x1 
and X2) so by (5.6b) (with the dummy variable now being f) we have 


II) S [ill Kelle = [Palle Ix2llp <00, feER. 


6.7 Bernstein’s Inequality 


Bernstein’s Inequality captures the engineering intuition that the rate at which 
a bandlimited signal can change is proportional to its bandwidth. The way the 
theorem is phrased makes it clear that it is applicable both to integrable signals 
that are bandlimited to W Hz and to energy-limited signals that are bandlimited 
to W Hz. 


Theorem 6.7.1 (Bernstein’s Inequality). [fx can be written as 


w . 
o)= | o(feri ay, teR 


11 The fact that (hers |&(f)| df is finite follows from the finiteness of Jie |@(f)|? df (which 
follows from Parseval’s Theorem) and from Proposition 3.4.3. The same argument applies to xe. 
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for some integrable function g, then 


| dz(t) 


| <4rW sup |z(r)|, te R. (6.76) 
dt TER 


Proof. A proof of a slightly more general version of this theorem can be found in 
(Pinsky, 2002, Chapter 2, Section 2.3.8). 


6.8 Time-Limited and Bandlimited Signals 


In this section we prove that no nonzero signal can be both time-limited and 
bandlimited. We shall present two proofs. The first is based on Theorem 6.8.1, 
which establishes a connection between bandlimited signals and entire functions. 
The second is based on the Fourier Series. 


We remind the reader that a function €: C — C is an entire function if it is 
analytic throughout the complex plane. 


Theorem 6.8.1. [fx is an energy-limited signal that is bandlimited to W Hz, then 
there exists an entire function €: C — C that agrees with x on the real axis 


é(t+i0)=a(t), teR (6.77) 


and that satisfies 
lE(z)| <ye™l, zec, (6.78) 


where y is some constant that can be taken as V2W ||x\| 9. 


Proof. Let x be an energy-limited signal that is bandlimited to W Hz. By Propo- 
sition 6.4.5 we can express X as 


w o . 
z(t) = fa e@"ft af, teR (6.79) 


for some square-integrable function g satisfying 
ne 2 
[loner = ix. (6.80) 
—Ww 
Consider now the function €: C — C defined by 


W 
é(z) = fa errr ap. Bet, (6.81) 


This function is well-defined for every z € C because in the region of integration 
the integrand can be bounded by 


Jo(f) e247] = |g(flen2nt BBO) 
< |g(f)| e27/F1 Pa@)l 
<|oAler™"l, [F< W, (6.82) 
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and the RHS of (6.82) is integrable over the interval [—-W, W] by (6.80) and Propo- 
sition 3.4.3. 


By (6.79) and (6.81) it follows that € is an extension of the function x in the sense 
of (6.77). It is but a technical matter to prove that € is analytic. One approach is 
to prove that it is differentiable at every z € C by verifying that the swapping of 
differentiation and integration, which leads to 


W 
Sy) = / of) (ide fle?" af, ze 


dz —W 


is justified. See (Rudin, 1974, Section 19.1) for a different approach. 
To prove (6.78) we compute 


WwW 
(| = | / sient af| 
W 
i2n fz 
< / lier lag 
W 
e27 W(z| d 
< i _ la Dlae 
WwW 
< eel Jaw / lal af 
—w 


= V2W |[x||, "71, 


where the inequality in the second line follows from Proposition 2.4.1; the inequality 
in the third line from (6.82); the inequality in the fourth line from Proposition 3.4.3; 
and the final equality from (6.80). 


Using Theorem 6.8.1 we can now easily prove the main result of this section. 


Theorem 6.8.2. Let W and T be fired nonnegative real numbers. If x is an energy- 
limited signal that is bandlimited to W Hz and that is time-limited in the sense that 
it is zero for allt ¢ [—T/2,T/2], then a(t) =0 for allt ER. 


By Note 6.4.12 this theorem also holds for integrable bandlimited signals. 


Proof. By Theorem 6.8.1 x can be extended to an entire function €. Since x has 
infinitely many zeros in a bounded interval (e.g., for all t € [T,2T]) and since € 
agrees with x on the real line, it follows that € also has infinitely many zeros 
in a bounded set (e.g., whenever z € {w € C : Im(w) = 0, Re(w) € [T,2T] }). 
Consequently, € is an entire function that has infinitely many zeros in a bounded 
subset of the complex plane and is thus the all-zero function (Rudin, 1974, Theo- 
rem 10.18). But since x and € agree on the real line, it follows that x is also the 
all-zero function. 
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Another proof can be based on the Fourier Series, which is discussed in the ap- 
pendix. Starting from (6.79) we obtain that the time-7/(2W) sample of x(-) satisfies 


1 1) _ ms i i2n fn/(2W) 
Tea \aw) = | of) ae df, eZ, 
where we recognize the RHS of the above as the 7-th Fourier Series Coefficient of 
the function f + g(f) I{|f| < W} with respect to the interval [—-W, W) (Note A.3.5 
on Page 693). But since x(t) = 0 whenever |t| > T/2, it follows that all but a finite 
number of these samples can be nonzero, thus leading us to conclude that all but a 
finite number of the Fourier Series Coefficients of g(-) are zero. By the uniqueness 
theorem for the Fourier Series (Theorem A.2.3) it follows that g(-) is equal to a 
trigonometric polynomial (except possibly on a set of measure zero). Thus, 


AF) = Dy ay e?™/OM), fe [-W,W]\N, (6.83) 
nen 
for some n € N; for some 2n + 1 complex numbers a_y,...,@n; and for some set 


N c [—W, W] of Lebesgue measure zero. Since the integral in (6.79) is insensitive 
to the behavior of g on the set N, it follows from (6.79) and (6.83) that 


Ww n 
x(t) = ee ys Gn ei2mnf /(2W) df 


n=—n 


a S- a | eens (t+a%y) I{|f|<Whdf 
n=—n =e 
= 2W y a, sinc(2Wt+n), teER, 


n=—n 


i.e., that x is a linear combination of a finite number of time-shifted sinc(-) func- 
tions. It now remains to show that no linear combination of a finite number of 
time-shifted sinc(-) functions can be zero for all ¢ € [T,2T] unless it is zero for 
all t € R. This can be established by extending the sincs to entire functions so 
that the linear combination of the time-shifted sinc(-) functions is also an entire 
function and by then calling again on the theorem that an entire function that has 
infinitely many zeros in a bounded subset of the complex plane must be the all-zero 
function. 


6.9 A Theorem by Paley and Wiener 


The theorem of Paley and Wiener that we discuss next is important in the study 
of bandlimited functions, but it will not be used in this book. 


Theorem 6.8.1 showed that every energy-limited signal x that is bandlimited to W 
Hz can be extended to an entire function € satisfying (6.78) for some constant + 
by defining €(z) as 


WwW 
&(z) =i &(fye?"t* df, ze. (6.84) 


—W 
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The theorem of Paley and Wiener that we present next can be viewed as the 
reverse statement. It demonstrates that if €: C — C is an entire function that 
satisfies (6.78) and whose restriction to the real axis is square integrable, then its 
restriction to the real axis is an energy-limited signal that is bandlimited to W Hz 
and, moreover, if we denote this restriction by x so x(t) = €(t + i0) for allt € R, 
then € is given by (6.84). This theorem demonstrates the close connection between 
entire functions satisfying (6.78)—functions that are called entire functions of 
exponential type—and energy-limited signals that are bandlimited to W Hz. 


Theorem 6.9.1 (Paley-Wiener). If for some positive constants W and y the entire 
function €: C —C satisfies 


lE(z)| < ye?™™HI, zEC (6.85) 
and if 
[e+ iopat<oo, (6.86) 


then there exists an energy-limited function g: R— C such that 
Ww . 
ae)= f ofertas, z€€. (6.87) 
—W 


Proof. See for example, (Rudin, 1974, Theorem 19.3) or (Katznelson, 1976, Chap- 
ter VI, Section 7) or (Dym and McKean, 1972, Section 3.3). 


6.10 Picket Fences and Poisson Summation 


Engineering textbooks often contain a useful expression for the FT of an infinite 
series of equally-spaced Dirac’s Deltas. Very roughly, the result is that the FT of 
the mapping 


tre S° 5(t+ 37s) 


j=-00 


is the mapping 
lt, ee n 
fr ae Ca a) 


where 6(-) denotes Dirac’s Delta. Needless to say, we are being extremely informal 
because we said nothing about convergence. This result is sometimes called the 
picket-fence miracle, because if we envision the plot of Dirac’s Delta as an 
upward pointing bold arrow stemming from the origin, then the plot of a sum of 
shifted Delta’s resembles a picket fence. The picket-fence miracle is that the FT 
of a picket fence is yet another scaled picket fence; see (Oppenheim and Willsky, 
1997, Chapter 4, Example 4.8 and also Chapter 7, Section 7.1.1.) or (Kwakernaak 
and Sivan, 1991, Chapter 7, Example 7.4.19(c)). 
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In the mathematical literature, this result is called “the Poisson summation for- 
mula.” It states that under certain conditions on the function yw € Ly, 


Co ; 1 [oe) 7 n 
DX von)=7— Yo OC). (6.88) 
j=-00 n=—0o 
To identify the roots of (6.88) define the mapping 


o(t)= S> v(t+JTs), (6.89) 


j=—00 


and note that this function is periodic in the sense that $(t + Ts) = ¢(t) for every 
t € R. Consequently, it is instructive to study its Fourier Series on the interval 
(—T,/2, T,/2] (Note A.3.5 in the appendix). Its 7-th Fourier Series Coefficient with 
respect to the interval [—T;/2, T;/2] is given by 


1 Ts /2 co . 
So ye este) e BOW at 


o(t) Sees e 2ant/Ts dt = 


Ts /2 1 
ie VTs VT3 Jt, 


/2 j=—0o 
1 oo Ts/2+jTs ionn(r—JTe)/T, 
———— Wr) eT etAT— J ls)! bs dr 
le ee 
1 2 Ts/2+jTs : Iv. 
=— w(t) e 2tnT Ts dr 
TE ane 
1 ie ; 
= al W(t) e2mnt/Ts qr 
a al 
—. Z 
aria): n € Z, 


where the first equality follows from the definition of ¢(-) (6.89); the second by 
swapping the summation and the integration and by defining r “= t+ jT,; the third 
by the periodicity of the complex exponential; the fourth because summing the 
integrals over disjoint intervals whose union is R is just the integral over R; and 
the final equality from the definition of the FT. 


We can thus interpret the RHS of (6.88) as the evaluation!” at t = 0 of the Fourier 
Series of ¢(-) and the LHS as the evaluation of ¢(-) at t = 0. Having established 
the origin of the Poisson summation formula, we can now readily state conditions 
that guarantee that it holds. An example of a set of conditions that guarantees 
(6.88) is the following: 


1) The function w(-) is integrable. 
2) The RHS of (6.89) converges at t = 0. 


3) The Fourier Series of ¢(-) converges at t = 0 to the value of ¢(-) at t = 0. 


12 At t = 0 the complex exponentials are all equal to one, and the Fourier Series is thus just 
the sum of the Fourier Series Coefficients. 
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We draw the reader’s attention to the fact that it is not enough that both sides of 
(6.88) converge absolutely and that both 7(-) and ~(-) be continuous; see (Katznel- 
son, 1976, Chapter VI, Section 1, Exercise 15). 


A setting where the above conditions are satisfied and where (6.88) thus holds is 
given in the following proposition. 


Proposition 6.10.1. Let w(-) be a continuous function satisfying 


Le if |t| > T, 
me ees dr otherwise, (6.90a) 


where 


= 
[ ke@Per < 00, (6.90b) 


and where T > 0 is some constant. Then for any T; > 0 


= Lee Ae 
3 wm=_ Le a n 


j=-00 


) (6.90c) 


Proof. The integrability of ~(-) follows because ¢)(-) is continuous and zero outside 
a finite interval. That the RHS of (6.89) converges at t = 0 follows because the 
fact that ¢)(-) is zero outside the interval [—T, +T] implies that only a finite number 
of terms contribute to the sum at t = 0. That the Fourier Series of ¢(-) converges 
at t = 0 to the value of ¢(-) at t = 0 follows from (Katznelson, 1976, Chapter 1, 
Section 6, Paragraph 6.2, Equation (6.2)) and from the corollary in (Katznelson, 
1976, Chapter 1, Section 3, Paragraph 3.1). 


6.11 Additional Reading 


There are a number of excellent books on Fourier Analysis. We mention here 
(Katznelson, 1976), (Dym and McKean, 1972), (Pinsky, 2002), and (Korner, 1988). 
In particular, readers who would like to better understand how the FT is defined for 
energy-limited functions that are not integrable may wish to consult (Katznelson, 
1976, Section VI 3.1) or (Dym and McKean, 1972, Sections 2.3-2.5). Numerous 
surprising applications of the FT can be found in (Koérner, 1988). 


Engineers often speak of the 2WT degrees of freedom that signals that are band- 
limited and time-limited have. A good starting point for the literature on this is 
(Slepian, 1976). 


Bandlimited functions are intimately related to “entire functions of exponential 
type.” For an accessible introduction to this concept see (Requicha, 1980); for a 
more mathematical approach see (Boas, 1954). 


6.12 Exercises 99 


6.12 Exercises 


Exercise 6.1 (Symmetries of the FT). Let x: R — C be integrable, and let X be its FT. 


(i) Show that if x is a real signal, then X is conjugate symmetric, i.e., @(—f) = #*(f), 
for every f € R. 

(ii) Show that if x is purely imaginary (i.e., takes on only purely imaginary values), 
then X is conjugate antisymmetric, i.e., ¢(—f) = —#*(f), for every f € R. 


(iii) Show that x can be written uniquely as the sum of a conjugate-symmetric function 
Zcs and a conjugate-antisymmetric function geas. Express gcs & Zcas in terms of X. 


Exercise 6.2 (Reconstructing a Function from Its IFT). Formulate and prove a result 
analogous to Theorem 6.2.12 for the Inverse Fourier Transform. 


Exercise 6.3 (Eigenfunctions of the FT). Show that if the energy-limited signal x satisfies 
Xx = Ax for some  € C, then \ can only be +1 or ti. (The Hermite functions are such 
signals.) 


Exercise 6.4 (Existence of a Stable Filter (1)). Let W > 0 be given. Does there exist a 
stable filter whose frequency response is zero for |f| < W and is one for W< f <2W? 


Exercise 6.5 (Existence of a Stable Filter (2)). Let W > 0 be given. Does there exist a 
stable filter whose frequency response is given by cos(f) for all |f| > W? 


Exercise 6.6 (Existence of an Energy-Limited Signal). Argue that there exists an energy- 
limited signal x whose FT is (the equivalence class of) the mapping f + e /I{f > 0}. 
What is the energy in x? What is the energy in the result of feeding x to an ideal unit-gain 
lowpass filter of cutoff frequency W. = 1? 


Exercise 6.7 (Passive Filters). Let h be the impulse response of a stable filter. Show that 
the condition that “for every x € £2 the energy in x xh does not exceed the energy in x” 
is equivalent to the condition : 

JA(A| <1, FER. 


Exercise 6.8 (Real and Imaginary Parts of Bandlimited Signals). Show that if x(-) is an 
integrable signal that is bandlimited to W Hz, then its real and imaginary parts are also 
integrable signals that are bandlimited to W Hz. 


Exercise 6.9 (Inner Products and Filtering). Let x be an energy-limited signal that is 
bandlimited to W Hz. Show that 


(x,y) = (x,y * LPFw), yeELo. 


Exercise 6.10 (Squaring a Signal). Show that if x is an eneregy-limited signal that is 
bandlimited to W Hz, then t+ 2?(t) is an integrable signal that is bandlimited to 2W 
Hz. 


Exercise 6.11 (Squared sinc(-)). Find the FT and IFT of the mapping t +> sinc?(t). 
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Exercise 6.12 (A Stable Filter). Show that the IFT of the function 


1 if|f|<a 
go fie el ita<|fl <b 
0 otherwise 


is given by 
0p) 1 cos(2mat) — cos(27bt) 
B00" Trae 2(b — a) 

and that this signal is integrable. Here b> a> 0. 


Exercise 6.13 (Multiplying Bandlimited Signals by a Carrier). Let x be an integrable 
signal that is bandlimited to W Hz. 


(i) Show that if f. > W, then 


ie a(t) cos(27 fct) dt = Sis x(t) sin(27 ft) dt = 0. 


—oo —oo 


(ii) Show that if fo > W/2, then 


/ *4@ fornia ; / "nak 


Exercise 6.14 (An Identity). Prove that for every We R 
sinc(2Wt) cos(27Wt) = sinc(4Wt), teER. 


Illustrate the identity in the frequency domain. 


Exercise 6.15 (Picket Fences). If you are familiar with Dirac’s Delta, explain how (6.88) is 
related to the heuristic statement that the FT of 7-7 6(t+ js) is Tet Venez (fF +n/Ts). 


Exercise 6.16 (Bounding the Derivative). Show that if x is an energy-limited signal that 
is bandlimited to W Hz, then its time-t derivative x'(t) satisfies 


i] < fS W8? Ix 


Hint: Use Proposition 6.4.5 and the Cauchy-Schwarz Inequality 


teR. 


Q) 


Exercise 6.17 (Another Notion of Bandwidth). Let 2/ denote the set of all energy-limited 
signals u such that at least 90% of the energy of u is contained in the band [—W, W]. 
Is U a linear subspace of Ly? 


Chapter 7 


Passband Signals and Their Representation 


7.1 Introduction 


The signals encountered in wireless communications are typically real passband 
signals. In this chapter we shall define such signals and define their bandwidth 
around a carrier frequency. We shall then explain how such signals can be rep- 
resented using their complex baseband representation. We shall emphasize two 
relationships: that between the energy in the passband signal and in its baseband 
representation, and that between the bandwidth of the passband signal around the 
carrier frequency and the bandwidth of its baseband representation. We ask the 
reader to pay special attention to the fact that only real passband signals have a 
baseband representation. 


Most of the chapter deals with the family of integrable passband signals. As we 
shall see in Corollary 7.2.4, an integrable passband signal must have finite energy, 
and this family is thus a subset of the family of energy-limited passband signals. 
Restricting ourselves to integrable signals—while reducing the generality of some of 
the results—simplifies the exposition because we can discuss the Fourier Transform 
without having to resort to the L2-Fourier Transform, which requires all statements 
to be phrased in terms of equivalence classes. But most of the derived results will 
also hold for the more general family of energy-limited passband signals with only 
slight modifications. The required modifications are discussed in Section 7.7. 


7.2 Baseband and Passband Signals 


Integrable signals that are bandlimited to W Hz were defined in Definition 6.4.9. By 
Proposition 6.4.10, an integrable signal x is bandlimited to W Hz if it is continuous 
and if its FT is zero for all frequencies outside the band [-W, W]. The bandwidth 
of x is the smallest W to which it is bandlimited (Definition 6.4.13). As an example, 
Figure 7.1 depicts the FT x of a real signal x, which is bandlimited to W Hz. 
Since the signal x in this example is real, its FT is conjugate-symmetric, (i.e., 
&(—f) = &*(f) for all frequencies f € R). Thus, the magnitude of x is symmetric 
(even), ie., |@(f)| = |@(—f)|, but its phase is anti-symmetric (odd). In the figure 
dashed lines indicate this conjugate symmetry. 
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Figure 7.2: The FT y of a real passband signal y that is bandlimited to W Hz 
around the carrier frequency fe. 


Consider now the real signal y whose FT y is depicted in Figure 7.2. Again, since 
the signal is real, its FT is conjugate-symmetric, and hence the dashed lines. This 
signal (if continuous) is bandlimited to f.+W/2 Hz. But note that g(f) = 0 for all 
frequencies f in the interval | f| < f-—W/2. Signals such as y are often encountered 
in wireless communication, because in a wireless channel the very-low frequencies 
often suffer severe attenuation and are therefore seldom used. Another reason 
is the concurrent use of the wireless spectrum by many systems. If all systems 
transmitted in the same frequency band, they would interfere with each other. 
Consequently, different systems are often assigned different carrier frequencies so 
that their transmitted signals will not overlap in frequency. This is why different 
radio stations transmit around different carrier frequencies. 


7.2.1 Definition and Characterization 
To describe signals such as y we use the following definition for passband signals. 


We ask the reader to recall the definition of the impulse response BPF w, ,(-) (see 
(5.21)) and of the frequency response BPFw +. (-) (see (6.41)) of the ideal unit-gain 
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bandpass filter of bandwidth W around the carrier frequency fe. 


Definition 7.2.1 (A Passband Signal). A signal xpp is said to be an integrable 
passband signal that is bandlimited to W Hz around the carrier fre- 
quency f. if it is integrable 


xpp € £1; (7.1a) 
the carrier frequency f. satisfies 
WwW 
fe > ae 0; (7.1b) 


and if Xpp is unaltered when it is fed to an ideal unit-gain bandpass filter of band- 
width W around the carrier frequency fe 


ZpB (t) = (Xpp * BPFw,,.)(¢), teER. (7.1c) 


An energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f. is analogously defined but with (7.1a) replaced by the 
condition 

xpp € Lo. (7.1a’) 


(That the convolution in (7.1c) is defined at every t € R whenever xpg is integrable 
can be shown using Proposition 6.2.5 because BPF w, f. is the Inverse Fourier Trans- 
form of the integrable function f tH I{|| f\l- fe < W/2}. That the convolution is 
defined at every t € R also when xppz is of finite energy can be shown by noting 
that BPF wy. is of finite energy, and the convolution of two finite-energy signals is 
defined at every time t € R; see Section 5.5.) 


In analogy to Proposition 6.4.10 we have the following characterization: 


Proposition 7.2.2 (Characterizing Integrable Passband Signals). Let f. and W 
satisfy fe > W/2 > 0. If xpp is an integrable signal, then each of the following 
statements is equivalent to the statement that xpp is an integrable passband signal 
that is bandlimited to W Hz around the carrier frequency fe. 


(a) The signal xpp is unaltered when it is bandpass filtered: 


xLpp(t) = (Xpp * BPFw,y.)(¢), teEeR. (7.2) 


(b) The signal xpp can be expressed as 
pp(t) = | épp(f)e?"* df, teR. (7.3) 
\lfl—fe|<Ww/2 
(c) The signal xpp is continuous and 
tpa(f)=0, |Ifl- fel > > (7.4) 
(d) There exists an integrable function g such that 


ren(t) = [ a(fevtaf, teR. (7.5) 
\lfl-fe|<Ww/2 


Proof. The proof is similar to the proof of Proposition 6.4.10 and is omitted. 
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7.2.2 Important Properties 


By comparing (7.4) with (6.62) we obtain: 


Corollary 7.2.3 (Passband Signals Are Bandlimited). Jf xpp is an integrable pass- 
band signal that is bandlimited to W Hz around the carrier frequency fc, then it is 
an integrable signal that is bandlimited to f. + W/2 Hz. 


Using Corollary 7.2.3 and Note 6.4.12 we obtain: 


Corollary 7.2.4 (Integrable Passband Signals Are of Finite Energy). Any inte- 
grable passband signal that is bandlimited to W Hz around the carrier frequency f. 
is of finite energy. 


Proposition 7.2.5 (Integrable Passband Signals through Stable Filters). If xpp 
is an integrable passband signal that is bandlimited to W Hz around the carrier 
frequency fc, and if h € Ly, is the impulse response of a stable filter, then the 
convolution xXpp xh is defined at every epoch; it is an integrable passband signal 
that is bandlimited to W Hz around the carrier frequency f-; and its FT is the 


mapping f > tpp(f) A(f). 


Proof. The proof is similar to the proof of the analogous result for bandlimited 
signals (Proposition 6.5.2) and is omitted. 


7.3. Bandwidth around a Carrier Frequency 


Definition 7.3.1 (The Bandwidth around a Carrier Frequency). The bandwidth 
around the carrier f. of an integrable or energy-limited passband signal xpp is 
the smallest W for which both (7.1b) and (7.1c) hold. 


Note 7.3.2 (The Carrier Frequency Is Critical). The bandwidth of xpg around 
the carrier frequency f, is determined not only by the FT of xpg but also by fe. 


For example, the real passband signal whose FT is depicted in Figure 7.3 is of 
bandwidth W around the carrier frequency f., but its bandwidth is smaller around 
a slightly higher carrier frequency. 


At first it may seem that the definition of bandwidth for passband signals is incon- 
sistent with the definition for baseband signals. This, however, is not the case. A 
good way to remember the definitions is to focus on real signals. For such signals 
the bandwidth for both baseband and passband signals is defined as the length of 
an interval of positive frequencies where the FT of the signal may be nonzero. For 
baseband signals the bandwidth is the length of the smallest interval of positive 
frequencies of the form [0,W| containing all positive frequencies where the FT may 
be nonzero. For passband signals it is the length of the smallest interval of positive 
frequencies that is symmetric around the carrier frequency f. and that contains 
all positive frequencies where the signal may be nonzero. (For complex signals we 
have to allow for the fact that the zeros of the FT may not be symmetric sets 
around the origin.) See also Figures 6.2 and 6.3. 
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Figure 7.3: The FT of a complex baseband signal of bandwidth W Hz (above) 
and of a real passband signal of bandwidth W Hz around the carrier frequency f, 
(below). 


We draw the reader’s attention to an important consequence of our definition of 
bandwidth: 


Proposition 7.3.3 (Multiplication by a Carrier Doubles the Bandwidth). [fx is 
an integrable signal of bandwidth W Hz and if f. > W, then t > x(t) cos(2r ft) is 
an integrable passband signal of bandwidth 2W around the carrier frequency fe. 


Proof. Define y: t+> 2(t) cos(27f.t). The proposition is a straightforward conse- 
quence of the definition of the bandwidth of x (Definition 6.4.13); the definition of 
the bandwidth of y around the carrier frequency f, (Definition 7.3.1); and the fact 
that if x is a continuous integrable signal of FT x, then y is a continuous integrable 
signal of FT 


HA) = 5 GF -L)+8F +h). FER, (7.6) 


where (7.6) follows from the calculation 
ai) =f ule at 


a / ” a(t) cos(2n fet) e~ 2"! at 
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Figure 7.4: The FT of a complex baseband bandwidth-W signal x. 
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Figure 7.5: The FT of y: t > 2(t) cos (27 fct), where x is as depicted in Figure 7.4. 
Note that x is of bandwidth W and that y is of bandwidth 2W around the carrier 
frequency fe. 


oo iQnfet 1 p—i2afet 
fo era 
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8 


[oe) 


NLR Mle 


x(t) e~2a(f—Fo)t dt+ 5 i a(t) em F+fe)t at 


—co —oco 


(@(f — fe) + 2(f+fe)), FER. 


As an illustration of the relation (7.6) note that if x is the complex bandwidth-W 
signal whose FT is depicted in Figure 7.4, then the signal y: t +> «(t) cos(27fct) is 
the complex passband signal of bandwidth 2W around f. whose FT is depicted in 
Figure 7.5. 

Similarly, if x is the real baseband signal of bandwidth W whose FT is depicted 


in Figure 7.6, then y: t+ x(t) cos(27fct) is the real passband signal of bandwidth 
2W around f, whose FT is depicted in Figure 7.7. 


In wireless applications the bandwidth W of the signals around the carrier frequency 
is typically much smaller than the carrier frequency f., but for most of our results 
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Figure 7.6: The FT of a real baseband bandwidth-W signal x. 


=> 
~~ 
se 
Nay 


Figure 7.7: The FT of y: t + 2(t) cos (27 fct), where X is as depicted in Figure 7.6. 
Here x is of bandwidth W and y is of bandwidth 2W around the carrier frequency 
te. 


it suffices that (7.1b) hold. 


The notion of a passband signal is also applied somewhat loosely in instances where 
the signals are not bandlimited. Engineers say that an energy-limited signal is a 
passband signal around the carrier frequency f, if most of its energy is contained 
in frequencies that are close to f, and —f,. Notice that in this “definition” we are 
relying heavily on Parseval’s theorem. I.e., we think about the energy Il><|I3 of x as 
being computed in the frequency domain, i.e., by computing IIxII3 =f a(fidz: 
By “most of the energy is contained in frequencies that are close to f, and —f,” 
we thus mean that most of the contributions to this integral come from small 
frequency intervals around f, and —f,. In other words, we say that x is a passband 
signal whose energy is mostly concentrated in a bandwidth W around the carrier 
frequency f, if 


[ worare [ e(N/ af. (7.7) 


—oo \lfl—fe|<Ww/2 


Similarly, a signal is approximately a baseband signal that is bandlimited to W Hz 
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if e hs 
/ e(NP af ~ / fea (7.8) 


7.4 Real Passband Signals 


Before discussing the baseband representation of real passband signals we empha- 
size the following. 


(i) The passband signals transmitted and received in Digital Communications 
are real. 


(ii) Only real passband signals have a baseband representation. 


(iii) The baseband representation of a real passband signal is typically a complex 
signal. 


(iv) While the FT of real signals is conjugate-symmetric (6.3), this does not imply 
any symmetry with respect to the carrier frequency. Thus, the FT depicted 
in Figure 7.2 and the one depicted in Figure 7.7 both correspond to real 
passband signals. (The former is bandlimited to W Hz around f, and the 
latter to 2W around fe.) 


We also note that if x is a real integrable signal, then its FT must be conjugate- 
symmetric. But if g € £; is such that its IFT g is real, it does not follow that g 
must be conjugate-symmetric. For example, the conjugate symmetry could be 
broken on a set of frequencies of Lebesgue measure zero, a set that does not influ- 
ence the IFT. As the next proposition shows, this is the only way the conjugate 
symmetry can be broken. 


Proposition 7.4.1. [fx is a real signal and if x = g for some integrable function 
g: fr g(f), then: 


(i) The signal x can be represented as the IFT of a conjugate-symmetric inte- 
grable function. 


(ii) The function g and the conjugate-symmetric function f — (g(f)+9*(—f))/2 
agree except on a set of frequencies of Lebesgue measure zero. 


Proof. Since x is real and since x = g it follows that 


8 
F aie, 
ay 
I 
aw 
oO 
— 
8 
os 
ww 
wa” 


= 52) + 52°) 
=3f ane arses (fo ane" af) 
=5f anemar+ sf o(inemray 
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=5f onertapes f g-Ael aj 


: iE aso) clan ft df, te R, 


where the first equality follows from the hypothesis that x is a real signal; the second 
because for any z € C we have Re(z) = (z+ 2*)/2; the third by the hypothesis 
that x = g; the fourth because conjugating a complex integral is tantamount 
to conjugating the integrand (Proposition 2.3.1 (ii)); the fifth by changing the 
integration variable in the second integral to 7 4 —f; and the sixth by combining 
the integrals. Thus, x is the IFT of the conjugate-symmetric function defined by 
f > (9(f) + 9*(—f)) /2, and (i) is established. 


As to (ii), since x is the IFT of both g and f + (9(f)+g*(—f)) /2, it follows from 
the IFT analog of Theorem 6.2.12 that the two agree outside a set of Lebesgue 
measure zero. 


7.5 The Analytic Signal 


In this section we shall define the analytic representation of a real passband 
signal. This is also sometimes called the analytic signal associated with the 
signal. We shall use the two terms interchangeably. The analytic representation 
will serve as a steppingstone to the baseband representation, which is extremely 
important in Digital Communications. We emphasize that an analytic signal can 
only be associated with a real passband signal. The analytic signal itself, however, 
is complex-valued. 


7.5.1. Definition and Characterization 


Let xpp be a real integrable passband signal that is bandlimited to W Hz around 
the carrier frequency f,. We would have liked to define its analytic representation 
as the complex signal x4 whose FT is the mapping 


fr tpp(f) Mf > 0}, (7.9) 


i.e., as the integrable signal whose FT is equal to zero at negative frequencies and to 
&pp(f) at nonnegative frequencies. While this is often the way we think about xa, 
there are two problems with this definition: an existence problem and a uniqueness 
problem. It is not prima facie clear that there exists an integrable signal whose FT 
is the mapping (7.9). (We shall soon see that there does.) And, since two signals 
that differ on a set of Lebesgue measure zero have identical Fourier Transforms, the 
above definition would not fully specify x4. This could be remedied by insisting 
that x be continuous, but this would further exacerbate the existence issue. (We 
shall see that there does exist a unique integrable continuous signal whose FT is 
the mapping (7.9), but this requires proof.) Our approach is to define xa as the 
IFT of the mapping (7.9) and to then explore the properties of xa. 
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Definition 7.5.1 (Analytic Representation of a Real Passband Signal). The an- 
alytic representation of a real integrable passband signal xpp that is bandlimited 
to W Hz around the carrier frequency f, is the complex signal xa defined by 


a(t) S i. app (f)e2"Ft df, teR. (7.10) 


Note that, by Proposition 7.2.2, épp(f) vanishes at frequencies f that satisfy 
lf] — fe| > W/2, so we can also write (7.10) as 


fot ¥ . 
att) = f ‘ épp(f)e?"' df, teER. (7.11) 


This latter expression has the advantage that it makes it clear that the integral 
is well-defined for every t € R, because the integrability of xpp implies that the 
integrand is bounded, i.e., that pz(f) < ||xpp||, for every f € R (Theorem 6.2.11) 
and hence that the mapping f > tpp(f) {| f — fe| < W/2} is integrable. 


Also note that our definition of the analytic signal may be off by a factor of two 
or 2 from the one used in some textbooks. (Some textbooks introduce a factor 
of 2 in order to make the energy in the analytic signal equal that in the passband 
signal. We do not do so and hence end up with a factor of two in (7.23) ahead.) 


We next show that the analytic signal x, is a continuous and integrable signal and 
that its FT is given by the mapping (7.9). In fact, we prove more. 


Proposition 7.5.2 (Characterizations of the Analytic Signal). Let xpp be a real 
integrable passband signal that is bandlimited to W Hz around the carrier fre- 
quency f-. Then each of the following statements is equivalent to the statement 
that the complex-valued signal xq is its analytic representation. 


(a) The signal xa is given by 
fot ; 
ety | apa(fye2"tt df, teR. (7.12) 


(b) The signal xa is a continuous integrable signal satisfying 


re fe ee ae (7.13) 


0 otherwise. 


(c) The signal xa is an integrable passband signal that is bandlimited to W Hz 
around the carrier frequency f. and that satisfies (7.13). 


(d) The signal xa is given by 
XA = XpBx* g (7.14a) 
for every integrable mapping g: f > g(f) satisfying 


WwW 
9 9 


af)=1, |f=f.|< (7.14b) 
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and 
gf) =0, |f+F|< “ (7.14c) 


(with g(f) unspecified at other frequencies). 


Proof. That Condition (a) is equivalent to the statement that x, is the analytic 
representation of xpg is just a restatement of Definition 7.5.1. It thus only remains 
to show that Conditions (a), (b), (c), and (d) are equivalent. We shall do so by 
establishing that (a) = (d); that (b) = (c); that (b) > (a); and that (d) = (c). 
To establish (a) = (d) we use the integrability of xpg and of g to compute xpp*& 
using Proposition 6.2.5 as 


Co 


(xen *#)() = / epa(f) 9(f) 2"! af 


—Co 


~ i tpa(f) 9(f) e?"t df 


fet ; 
= i: epa(f)9(f) 2" af 


ame) 


fet : 
=} app(fyeet df, teR, 
fi 


WwW 
a) 


where the first equality follows from Proposition 6.2.5; the second because the 
assumption that xpp is a passband signal implies, by Proposition 7.2.2 (cf. (c)), 
that the only negative frequencies f < 0 where ¢pp(f) can be nonzero are those 
satisfying |— f — f-.| < W/2, and at those frequencies g is zero by (7.14c); the third 
by Proposition 7.2.2 (cf. (c)); and the fourth equality by (7.14b). This establishes 
that (a) <= (d). 

The equivalence (b) = (c) is an immediate consequence of Proposition 7.2.2. That 
(b) = (a) can be proved using Corollary 6.2.14 as follows. If (b) holds, then xa 
is a continuous integrable signal whose FT is given by the integrable function on 
the RHS of (7.13) and therefore, by Corollary 6.2.14, xa is the IFT of the RHS of 
(7.13), thus establishing (a). 

We now complete the proof by showing that (d) = (c). To this end let g: ft g(f) 
be a continuous integrable function satisfying (7.14b) & (7.14c) and additionally 
satisfying that its IFT g is integrable. For example, g could be the function from R 
to R that is defined by 


1 if |f — fc| < W/2, 
Welt fel el otherwise, 


where W, can be chosen arbitrarily in the range 


W<W. < 2fe. (7.16) 
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This function is depicted in Figure 7.8. By direct calculation, it can be shown that 
its IFT is given by 

1 cos(aWt) — cos(7W.t) 
(xt)? W. — W ; 


g(t) = el? tet teR, (7.17) 


which is integrable. Define now h = g and note that, by Corollary 6.2.14, h= g. 
If (d) holds, then 


Xa = Xpp*§ 


= Xpp *h, 


so Xa is the result of feeding an integrable passband signal that is bandlimited 
to W Hz around the carrier frequency f. (the signal xpg) through a stable filter 
(of impulse response h). Consequently, by Proposition 7.2.5, xa is an integrable 
passband signal that is bandlimited to W Hz around the carrier frequency f, and 
its FT is given by f + &pp(f)A(f). Thus, as we next justify, 


éa(f) = @pp(f) ACS) 
= tpa(f) 9(f) 
= tpp(f) 9(f) Lf = 0} 
= tpa(f){f =o}, feR, 


thus establishing (c). Here the third equality is justified by noting that the as- 
sumption that xpg is a passband signal implies, by Proposition 7.2.2 (cf. (c)), 
that the only negative frequencies f < 0 where #pp(f) can be nonzero are those 
satisfying |—f — f-| < W/2, and at those frequencies g is zero by (7.15), (7.16), 
and (7.1b). The fourth equality follows by noting that the assumption that xpp 
is a passband signal implies, by Proposition 7.2.2 (cf. (c)), that the only positive 
frequencies f > 0 where &pp(f) can be nonzero are those satisfying | f — f.| << W/2 
and at those frequencies g(f) = 1 by (7.15). 


7.5.2 From x, back to xpp 


Proposition 7.5.2 describes the analytic representation x, in terms of the real 
passband signal xpg. This representation would have been useless if we had not 
been able to recover xpp from xa. Fortunately, we can. The key is that, because 
xXpp is real, its FT is conjugate-symmetric 


tpa(—f) = tpp(f), FER. (7.18) 


Consequently, since the FT of x, is equal to that of xpp at the positive frequencies 
and to zero at the negative frequencies (7.13), we can add to X, its conjugated 
mirror-image to obtain Xpp: 


fpa(f) =@a(f)+@a(-f), FER (7.19) 


TAt t =0, the RHS of (7.17) should be interpreted as (W + We)/2. 
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Figure 7.8: The function g of (7.15), which is used in the proof of Proposition 7.5.2. 


see Figure 7.12 on Page 124. From here it is just a technicality to obtain the 
time-domain relationship 


tpp(t) =2Re(ra(t)), tER. (7.20) 
These results are summarized in the following proposition. 


Proposition 7.5.3 (Recovering xpp from x). Let xpp be a real integrable pass- 
band signal that is bandlimited to W Hz around the carrier frequency fc, and let xa 
be its analytic representation. Then, 


fpa(f)=@a(f)+@a(-f), FER, (7.21a) 


and 
tpp(t) =2Re(ra(t)), teER. (7.21b) 


Proof. The frequency relation (7.21a) is just a restatement of (7.19), whose deriva- 
tion was rigorous. To prove (7.21b) we note that, by Proposition 7.2.2 (cf. (b) & 


(c)), 
rpp(t) = i épp(f)e?"!* df 


—co 


=) ia (f) ean ft af+ f &pp(f) el?" df 
0 


—co 


| 
8 
s 
w 


0 
= x(t) +f épp(f) el?" df 


= a(t) + i apa(—f)eP"ht af 
= wa(t) + | iby ( feet af 


=ant)+ (f° trate af) 
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where in the second equality we broke the integral into two; in the third we used 
Definition 7.5.1; in the fourth we changed the integration variable to 7 4S _f; 
in the fifth we used the conjugate symmetry of Xpp (7.18); in the sixth we used 
the fact that conjugating the integrand results in the conjugation of the integral 
(Proposition 2.3.1); in the seventh we used the definition of the analytic signal; 
and in the last equality we used the fact that a complex number and its conjugate 
add up to twice its real part. 


7.5.3 Relating (XpB; YPB) to (Xa, YA) 


We next relate the inner product between two real passband signals to the inner 
product between their analytic representations. 


Proposition 7.5.4 ((xpp, ypp) and (xa,ya)). Let xpp and ypp be real integrable 
passband signals that are bandlimited to W Hz around the carrier frequency f., and 
let xa and ya be their analytic representations. Then 


(xpp, yps) = 2 Re((xa,ya)), (7.22) 


and 
lIxpplls = 2[Ixall3 - (7.23) 


Note that in (7.22) the inner product appearing on the LHS is the inner product 
between real signals whereas the one appearing on the RHS is between complex 
signals. 


Proof. We first note that the inner products and energies are well-defined because 
integrable passband signals are also energy-limited (Corollary 7.2.4). Next, even 
though (7.23) is a special case of (7.22), we first prove (7.23). The proof is a simple 
application of Parseval’s Theorem. The intuition is as follows. Since xppz is real, 
it follows that its FT is conjugate-symmetric (7.18) so the magnitude of Xpp is 
symmetric. Consequently, the positive frequencies and the negative frequencies 
of Xpp contribute an equal share to the total energy in Xpg. And since the energy 
in the analytic representation is equal to the share corresponding to the positive 
frequencies only, its energy must be half the energy of xppz. 


This can be argued more formally as follows. Because xpg is real-valued, its FT xpp 
is conjugate-symmetric (7.18), so its magnitude is symmetric |¢pp(f)| = |@pp(—f)| 
for all f € R and, a fortiori, 


lee) 0 
[ \eeotnPar= [leva nas. (7.24) 
Also, by Parseval’s Theorem (applied to xpp), 
lee) 0 
[ \ieanPar+ [teal APas = lxealls. (7.25) 
0 —oco 
Consequently, by combining (7.24) and (7.25), we obtain 


i 1 
[PO lawotn Pas = 5 Ipsos. (7.26) 
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We can now establish (7.23) from (7.26) by using Parseval’s Theorem (applied 
to xq) and (7.13) to obtain 


2 A 2 
\|xa||5 = Xa llo 


=f eatnrar 


—Co 


= | lépa(f)Pdf 
1 2 
= xpBll3 » 


where the last equality follows from (7.26). 


We next prove (7.22). We offer two proofs. The first is very similar to our proof 
of (7.23): we use Parseval’s Theorem to express the inner products in the fre- 
quency domain, and then argue that the contribution of the negative frequencies 
to the inner product is the complex conjugate of the contribution of the positive 
frequencies. The second proof uses a trick to relate inner products and energies. 


We begin with the first proof. Using Proposition 7.5.3 we have 
tpp(f) =fa(f)+2,a(-f), feER, 
oes(f) = dal(f) + Ga(-f), feER. 


Using Parseval’s Theorem we now have 


(XpB, YPB) = (XPB, YPB) 


= i tpp(f)9pp(f) df 


=f acpocare (fo aat-naxenar)’ 
=f sanacnars (f aaHaxHar) 


= (Ka, Va) + (Ka, Va)” 
= 2Re((Ka, ¥a)) 
= 2Re((xa,ya)), 
where the fifth equality follows because at all frequencies f € R the cross-terms 


&a(f) ga(—f) and #4 (—f) 94 (f) are zero, and where the last equality follows from 
Parseval’s Theorem. 


116 Passband Signals and Their Representation 


The second proof is based on (7.23) and on the identity 
2Re((u,v)) =|lu+vl3 — fully —Ilvllz, uve Le, (7.27) 
which holds for both complex and real signals and which follows by expressing 
Ju + v5 as 
Jju + vilZ = (a+ v,u4 v) 
= (u,u) + (u,v) + (v, u) + (v,v) 
= |lulle + IIvlle + (u,v) + (u,v) 


= |lulle + live + 2Re((u,v)). 


From Identity (7.27) and from (7.23) we have for the real signals xpp and ypp 
2(Xpp, YPB) = 2 Re((xpp, ypp)) 
2 2 2 
= |[xpp + ypsllg — ||xpslla — llypslls 
2 2 2 
= 2{IIxa + yall2 — Ileal — llyall2) 
= 4Re((xa,ya)), 


where the first equality follows because the passband signals are real; the second 
from Identity (7.27) applied to the passband signals xpp and ypg; the third from 
the second part of Proposition 7.5.4 and because the analytic representation of 
Xpp + ypp is Xa + ya; and the final equality from Identity (7.27) applied to the 
analytic signals xq and ya. 


7.6 Baseband Representation of Real Passband Signals 


Strictly speaking, the baseband representation xpp of a real passband sig- 
nal Xpp is not a “representation” because one cannot recover Xpp from xpp alone; 
one also needs to know the carrier frequency f,. This may seem like a disadvantage, 
but engineers view this as an advantage. Indeed, in some cases, it may illuminate 
the fact that certain operations and results do not depend on the carrier frequency. 
This decoupling of various operations from the carrier frequency is very useful in 
hardware implementation of communication systems that need to work around 
selectable carrier frequencies. It allows for some of the processing to be done us- 
ing carrier-independent hardware and for only a small part of the communication 
system to be tunable to the carrier frequency. Very loosely speaking, engineers 
think of xpp as everything about xpp that is not carrier-dependent. Thus, one 
does not usually expect the quantity f. to show up in a formula for the baseband 
representation. Philosophical thoughts aside, the baseband representation has a 
straightforward definition. 


7.6.1 Definition and Characterization 


Definition 7.6.1 (Baseband Representation). The baseband representation of 
a real integrable passband signal xpp that is bandlimited to W Hz around the carrier 
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frequency f-. is the complex signal 

rpp(t) =e ?7Fet ea(t), tER, (7.28) 
where xa is the analytic representation of Xpp. 
Note that, by (7.28), the magnitudes of x, and xpp are identical 


|cpB(t)| = |x (t) , teR. (7.29) 


Consequently, since x, is integrable we also have: 


Proposition 7.6.2 (Integrability of xpg Implies Integrability of xp). The base- 
band representation of a real integrable passband signal that is bandlimited to W 
Hz around the carrier frequency f. is integrable. 


By (7.28) and (7.13) we obtain that if xpg is a real integrable passband signal that 
is bandlimited to W Hz around the carrier frequency f., then 


fpa(f+ fe) if |f| < W/2, 


: (7.30) 
0 otherwise. 


tpp(f) =fa(f + fe) = 


Thus, the FT of xpp is the FT of x, but shifted to the left by the carrier fre- 
quency f,. The relationship between the Fourier Transforms of xpp, xa, and xpp 
is depicted in Figure 7.9. 


We have defined the baseband representation of a passband signal in terms of its 
analytic representation, but sometimes it is useful to define the baseband represen- 
tation directly in terms of the passband signal. This is not very difficult. Rather 
than taking the passband signal and passing it through a filter of frequency re- 
sponse g satisfying (7.14) to obtain xq and then multiplying the result by e~?7/<¢ 
to obtain xpp, we can multiply xpp by t rh e~2tfet and then filter the result to 
obtain the baseband representation. This procedure is depicted in the frequency 
domain in Figure 7.10 and is made precise in the following proposition. 


Proposition 7.6.3 (From xpp to xpp Directly). [fxpp is a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency f., then its baseband 
representation Xpp is given by 


xpp = (tr 7?! rpp(t)) * Bo, (7.31a) 


where go: f > go(f) is any integrable function satisfying 


gf =1, lfls (7.31) 
and 
go(f) =9, |f+2fc] < as (7.31c) 
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Figure 7.9: The Fourier Transforms of the analytic signal x4 and of the baseband 
representation xpp of a real passband signal xpp. 


Proof. The proof is all in Figure 7.10. For the pedantic reader we provide more 
details. By Definition 7.6.1 and by Proposition 7.5.2 (cf. (d)) we have for any 
integrable function g: f +> g(f) satisfying (7.14b) & (7.14c) 
ZBB (t) = e 2t fet (Xpp * g) (t) 
ez a épp(f) 9(f) ei2nft df 
= | apn(f)a(f) eet" af 


~ ? tep(f + fe) (F + fe) e274 af 


= : tpn (f+ fe) Gof) e2™F af 
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épa(f + fc) 


f -wly 


go(f) 


w|g 


w 
2 


Figure 7.10: A frequency-domain description of the process for deriving xpp di- 
rectly from xpg. From top to bottom: pg; the FT of t + e7!?™fct app(t); a 
function go satisfying (7.31b) & (7.31c); and Xpp. 
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at (( > 7 i2t fet tpp(t)) * 0) (G); 


where we defined 


g(f)=9( f+ fe), FER, (7.32) 


and where we use the following justification. The second equality follows from 
Proposition 6.2.5; the third by pulling the complex exponential into the integral; 
the fourth by the defining fA f —f.-; the fifth by defining the function gop as in 
(7.32); and the final equality by Proposition 6.2.5 using the fact that 


the FT of tr e~? 7c! rpp(t) is fH épp(f + fc). (7.33) 


The proposition now follows by noting that g satisfies (7.14b) & (7.14c) if, and 
only if, the mapping go defined in (7.32) satisfies (7.31b) & (7.31c). 


Corollary 7.6.4. [fxpg is a real integrable passband signal that is bandlimited to W 
Hz around the carrier frequency fc, then its baseband representation xpp is given 
by 


XBB = (t ise aseh LPB (t)) *LPFy., (7.34a) 


where the cutoff frequency W. can be chosen arbitrarily in the range 


AW ROS, (7.34b) 


Proof. Let W, satisfy (7.34b) and define go as follows: if W. is strictly smaller 
than 2f,—W/2, define go(f) = I{|f| < W.}; otherwise define go(f) =1{|f| < We}. 
In both cases go satisfies (7.31b) & (7.31c) and 


& = LPF, . (7.35) 


The result now follows by applying Proposition 7.6.3 with this choice of go. 


In analogy to Proposition 7.5.2, we can characterize the baseband representation 
of passband signals as follows. 


Proposition 7.6.5 (Characterizing the Baseband Representation). Let xpp be 
a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f.. Then each of the following statements is equivalent to the statement 
that the complex signal xpp is its baseband representation. 


(a) The signal xpp is given by 


w/2 
ZBB (t) = lv IPB (f + fe) ean ft df, teER. (7.36) 
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(b) The signal xpp is a continuous integrable signal satisfying 
r ‘ WwW 
tpa(f) =@ra(ft+ fe) {Ils 5}, fer (7.37) 


(c) The signal xpp is an integrable signal that is bandlimited to W/2 Hz and that 
satisfies (7.37). 


(d) The signal xpp is given by (7.31a) for any go: f + go(f) satisfying (7.31b) 
& (7.31c). 


Proof. Parts (a), (b), and (c) can be easily deduced from their counterparts in 
Proposition 7.5.2 using Definition 7.6.1 and the fact that (7.29) implies that the 
integrability of xpp is equivalent to the integrability of x4. Part (d) isa restatement 
of Proposition 7.6.3. 


7.6.2. The In-Phase and Quadrature Components 


The convolution in (7.34a) is a convolution between a complex signal (the signal 
~Prfet r5_(t)) and a real signal (the signal LPFy.). This should not alarm 
you. The convolution of two complex signals evaluated at time t is expressed as an 
integral (5.2), and in the case of complex signals this is an integral (over the real 
line) of a complex-valued integrand. Such integrals were addressed in Section 2.3. 
It should, however, be noted that since the definition of the convolution of two sig- 
nals involves their products, the real part of the convolution of two complex-valued 
signals is, in general, not equal to the convolution of their real parts. However, as 
we next show, if one of the signals is real—as is the case in (7.34a)—then things 
become simpler: if x is a complex-valued function of time and if h is a real-valued 
function of time, then 


tre 


Re(x *h) = Re(x) *h and Im(x*h) = Im(x) *h, his real-valued. | (7.38) 


This follows from the definition of the convolution, 


Co 


(x xh)(t) = / u(r) h(t — 7) dr 


and from the basic properties of complex integrals (Proposition 2.3.1) by noting 
that if h(-) is real-valued, then for all t,7 € R, 

Re(2(r) h(t — r)) = Re(z(r)) h(t — 7), 

Im(z(r) h(t — T)) = Im(z(r)) h(t — 7). 


We next use (7.38) to express the convolution in (7.3la) using real-number oper- 
ations. To that end we first note that since xpp is real, it follows from Euler’s 
Identity 

e? —cos@+isiné, OER (7.39) 
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that 
Re(app(t) e~?*/t) = xpp(t) cos(27fct), tER, (7.40a) 
Im(app(t) e7?*/e") = —app(t) sin(2rfct), teER, (7.40b) 


so by (7.34a), (7.38), and (7.40) 


Re(xpp) = (¢ + xpp(t) cos(2m fot)) ® EPR yy ys (7.41a) 


Im(xpp) = -(t > app(t) sin(2a fot)) eUPE Wi jas (7.41b) 


It is common in the engineering literature to refer to the real part of xpp as 
the in-phase component of xpg and to the imaginary part as the quadrature 
component of xpg. 


Definition 7.6.6 (In-Phase and Quadrature Components). The in-phase com- 
ponent of a real integrable passband signal xpz that is bandlimited to W Hz around 
the carrier frequency f, is the real part of its baseband representation, 1.e., 


Re(xpp) = (t + xpp(t) cos(2m fot)) xLPFy,. (In-Phase) 


The quadrature component is the imaginary part of its baseband representation, 
1.€., 


Im(xpp) = -(t + rpp(t) sin(2r fet)) *LPFw,. (Quadrature) 
Here W, is any cutoff frequency in the range W/2 < W. < 2f. — W/2. 


Figure 7.11 depicts a block diagram of a circuit that produces the baseband rep- 
resentation of a real passband signal. This circuit will play an important role 
in Chapter 9 when we discuss the Sampling Theorem for passband signals and 
complex sampling. 


7.6.3, Bandwidth Considerations 


The following is a simple but exceedingly important observation regarding band- 
width. Recall that the bandwidth of xpp around the carrier frequency f, is defined 
in Definition 7.3.1 and that the bandwidth of the baseband signal xpp is defined 
in Definition 6.4.13. 


Proposition 7.6.7 (xpg, xpgp, and Bandwidth). Jf the real integrable passband 
signal xpp is of bandwidth W Hz around the carrier frequency f., then its baseband 
representation Xpp is an integrable signal of bandwidth W/2 Hz. 


Proof. This can be seen graphically from Figure 7.9 or from Figure 7.10. It can 
be deduced analytically from (7.30). 
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xpp(t) cos(2n fe) Re(xpp(t)) 


% =) PR 1 


@ cos(27 fet) 


tpp(t) —4 Wew. < 2f.- ¥ 


=) oe 3 


—axpp(t) sin(27 fot) Im(zgp(t)) 


Figure 7.11: Obtaining the baseband representation of a real passband signal. 


7.6.4 Recovering xpp from xpp 


Recovering a real passband signal xpg from its baseband representation xpp is 
conceptually simple. We can recover the analytic representation via (7.28) and 
then use Proposition 7.5.3 to recover Xpp: 


Proposition 7.6.8 (From xppg to xpp). Let xpp be a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency f., and let xpp be 
its baseband representation. Then, 


épa(f) = @pn(f — fe) + @bp(-f—fe), SER, (7.42a) 


and 
ZPB (t) = 2Re(xpp (t) eas) tEeR. (7.42b) 


The process of recovering xpp from xgp is depicted in the frequency domain in 
Figure 7.12. It can, of course, also be carried out using real-number operations 
only by rewriting (7.42b) as 


tpp(t) = 2 Re(xpp(t)) cos(2a fet) — 2Im(xpp(t)) sin(Qrfet), tER. (7.43) 
It should be emphasized that (7.42b) does not characterize the baseband represen- 
tation of xpp; it is possible that zpp(t) = 2 Re(z(t) e?*/e") hold at every time t and 


that z not be the baseband representation of xpp. However, as the next proposition 
shows, this cannot happen if z is bandlimited to W/2 Hz. 


Proposition 7.6.9. Let xpg be a real integrable passband signal that is bandlimited 
to W Hz around the carrier frequency f.. If the complex signal z satisfies 


tpp(t) = 2Re(z(t)e®""), teER, (7.44) 
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Figure 7.12: Recovering a passband signal from its baseband representation. Top 
plot of Xpp is the transform of xgp; next is the transform of t + app(t) e?"Fe*; the 
transform of r§,(t); the transform of t > r%,(t) e~?™4<*; and finally the transform 
of tre xrpp(t) eam fet +zbp (t) e 27 fet — 2Re (cpp (t) ere) = ZpB (t). 
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and is an integrable signal that is bandlimited to W/2 Hz, then z is the baseband 
representation of Xpp. 


Proof. Since z is bandlimited to W/2 Hz, it follows from Proposition 6.4.10 (cf. (c)) 
that z must be continuous and that its FT must vanish for |f| > W/2. Conse- 
quently, by Proposition 7.6.5 (cf. (b)), all that remains to show in order to establish 
that z is the baseband representation of xpp is that 


2(f)=4ppn(ft fe), If] < W/2, (7.45) 


and this is what we proceed to do. By taking the FT of both sides of (7.44) we 
obtain that 


épp(f) =2(f-—f.) +2 (-f-fe), FER, (7.46) 
or, upon defining fa f-fe, 
tpn(f + fe) = 2(f) + 2*(-f-2f.), feR. (7.47) 


By recalling that f, > W/2 and that z is zero for frequencies f satisfying |f| > W/2, 
we obtain that 2*(—f — 2f,) is zero whenever |f| < W/2 so 


a(f) + 2*(—f —2fe) =2(f), |fl < W/2. (7.48) 
Combining (7.47) and (7.48) we obtain 


épa(f + fe) =2(f), fl < W/2, 


thus establishing (7.45) and hence completing the proof. 


Proposition 7.6.9 is more useful than its appearance may suggest. It provides an 
alternative way of computing the baseband representation of a signal. It demon- 
strates that if we can use algebra to express Xpp in the form (7.44) for some signal z, 
and if we can verify that z is bandlimited to W/2 Hz, then z must be the baseband 
representation of xpp. 


Note that the proof would also work if we replaced the assumption that z is an 
integrable signal that is bandlimited to W/2 Hz with the assumption that z is an 
integrable signal that is bandlimited to f. Hz. 


7.6.5 Relating (Xpp, YPB) to (XBB, YBB) 
If xpp and ypp are integrable real passband signals that are bandlimited to W Hz 


around the carrier frequency f., and if x4, Xpp , YA, and ypp are their corre- 
sponding analytic and baseband representations, then, by (7.28), 


(XBB, YBB) = (Ka,ya), (7.49) 
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because 


Combining (7.49) with Proposition 7.5.4 we obtain the following relationship be- 
tween the inner product between two real passband signals and the inner product 
between their corresponding complex baseband representations. 


Theorem 7.6.10 ((xpp,yps) and (xgp,yep))- Let xpp and ypp be two real inte- 
grable passband signals that are bandlimited to W Hz around the carrier frequency 
fc, and let Xpp and ypp be their corresponding baseband representations. Then 


(Xp, yps) = 2 Re((xpp, YBp)); (7.50) 


and 


IIxppll2 a2 IIxXpall (7.51) 


An extremely important corollary provides a necessary and sufficient condition for 
the inner product between two real passband signals to be zero, i.e., for two real 
passband signals to be orthogonal. 


Corollary 7.6.11 (Characterizing Orthogonal Real Passband Signals). Two in- 
tegrable real passband signals Xpp,ypp that are bandlimited to W Hz around the 
carrier frequency f. are orthogonal if, and only if, the inner product between their 
baseband representations is purely imaginary (i.e., of zero real part). 


Thus, for two such bandpass signals to be orthogonal their baseband represen- 
tations need not be orthogonal. It suffices that their inner product be purely 
imaginary. 


7.6.6 The Baseband Representation of xpp x ypp 


Proposition 7.6.12 (The Baseband Representation of xpp * ypp Is xXpp * ypp)- 
Let xpp and ypg be real integrable passband signals that are bandlimited to W Hz 
around the carrier frequency f., and let xpp and ypp be their baseband repre- 
sentations. Then the convolution xpg * ypp ts a real integrable passband signal 
that is bandlimited to W Hz around the carrier frequency f. and whose baseband 
representation 1S XBB * YBB- 
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gep(f) 
A 
f 
fpp(f) ¥ep(f) 
A 
f 


Figure 7.13: The convolution of two real passband signals and its baseband rep- 
resentation. 
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Proof. The proof is illustrated in Figure 7.13 on Page 127. All that remains is to 
add some technical details. We begin by defining 


Z—=XpBp * YPB 


and by noting that, by Proposition 7.2.5, z is an integrable real passband signal 
that is bandlimited to W Hz around the carrier frequency f, and that its FT is 
given by 

2(f)=4pp(f)dpp(f), FER. (7.52) 


Thus, it is at least meaningful to discuss the baseband representation of xpp xypp. 


We next note that, by Proposition 7.6.5, both xpp and ypp are integrable signals 
that are bandlimited to W/2 Hz. Consequently, by Proposition 6.5.2, the convolu- 
tion U = Xpp *ypp is defined at every epoch ¢ and is also an integrable signal that 
is bandlimited to W/2 Hz. Its FT is 


a(f) = fpp(f) deals), feER. (7.53) 
From Proposition 7.6.5 we infer that to prove that u is the baseband representation 


of z it only remains to verify that 0 is the mapping f + 2(f + f-) I{|f| < W/2}, 
which, in view of (7.52) and (7.53), is equivalent to showing that 


épp(S) opp (S) = tp (f + fc) OpB(f + fe) Ulf] < W/2}, feR. (7.54) 


But this follows because the fact that xpp and ypp are the baseband representa- 
tions of xpg and ypg implies that 


A 


tpp(f) = tpa(f + fe) Ulf] <W/2}, fel 
gep(f) = gen(f + fe) I{|f]| < W/2}, fel 


from which (7.54) follows. 


wa 


7.6.7 The Baseband Representation of xpp xh 


We next study the result of passing a real integrable passband signal xpg that is 
bandlimited to W Hz around the carrier frequency f, through a real stable filter 
of impulse response h. Our focus is on the baseband representation of the result. 


Proposition 7.6.13 (Baseband Representation of xppxh). Let xpp be a real inte- 
grable passband signal that is bandlimited to W Hz around the carrier frequency fe, 
and let h be a real integrable signal. Then xpp xh is defined at every time instant; 
it is a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f.; and its baseband representation is of FT 


froapa(fh(ft fe), feER, (7.55) 


where Xpp is the baseband representation of xpp. 
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Proof. That the convolution xpp «h is defined at every time instant follows from 
Proposition 7.2.5. Defining y = xpp xh we have by the same proposition that y is 
a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency f, and that its FT is given by 


Of) =tpa(fh(f), SER. (7.56) 


Applying Proposition 7.6.5 (cf. (b)) to the signal y we obtain that the baseband 
representation of y is of FT 


frapa(ft fh(ft+ fe) if] <Ww/2}, fer. (7.57) 


To conclude the proof it thus remains to establish that the mappings (7.57) and 
(7.55) are identical. But this follows because, by Proposition 7.6.5 (cf. (b)) applied 
to the signal xpp, 


Ww 
tnn(f) =arn(f+fI{lflsz} fer. 


Motivated by Proposition 7.6.13 we put forth the following definition. 


Definition 7.6.14 (Frequency Response with Respect to a Band). For a stable 
real filter of impulse response h we define the frequency response with respect 
to the bandwidth W around the carrier frequency f. (satisfying f. > W/2) 
as the mapping 


pohr+ sy i{iis Sh. (7.58) 


Figure 7.14 illustrates the relationship between the frequency response of a real 
filter and its response with respect to the carrier frequency f, and bandwidth W. 
Heuristically, we can think of the frequency response with respect to the band- 
width W around the carrier frequency f, of a filter of real impulse response h as 
the FT of the baseband representation of h « BPFw,,..” 


With the aid of Definition 7.6.14 we can restate Proposition 7.6.13 as stating that 
the baseband representation of the result of passing a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency f, through a 
stable real filter is the product of the FT of the baseband representation of the 
signal by the frequency response with respect to the bandwidth W around the 
carrier frequency f, of the filter. This relationship is illustrated in Figures 7.15 
and 7.16. The former depicts the product of the FT of a real passband signal xpp 
and the frequency response of a real filter h. The latter depicts the product of the 
baseband representation xpp of xpp by the frequency response of h with respect 
to the bandwidth W around the carrier frequency fe. 


The relationship between some of the properties of xpg, Xa, and xpp are summa- 
rized in Table 7.1 on Page 142. 


?This is mathematically somewhat problematic because hxBPFw, f, need not be an integrable 
signal. But this can be remedied because h * BPFw,f, is an energy-limited passband signal 
that is bandlimited to W Hz around the carrier frequency, and, as such, also has a baseband 
representation; see Section 7.7. 


130 Passband Signals and Their Representation 


h(f) 
A 
aw 
“ a po 
: . 
’ . 
’ . 
’ . 
a“ ‘ 
ra ‘ 
és Dried ‘ 
car : ' >f 
te 
A 
f 
WwW WwW 
7 ei, 72: 


Figure 7.14: A real filter’s frequency response (top) and its frequency response 
with respect to the bandwidth W around the carrier frequency f. (bottom). 


7.7 Energy-Limited Passband Signals 


We next repeat the results of this chapter under the weaker assumption that the 
passband signal is energy-limited and not necessarily integrable. The key results 
require only minor adjustments, and most of the derivations are almost identical 
and are therefore omitted. The reader is encouraged to focus on the results and to 
read the proofs only if needed. 


7.7.1. Characterization of Energy-Limited Passband Signals 


Recall that energy-limited passband signals were defined in Definition 7.2.1 as 
energy-limited signals that are unaltered by bandpass filtering. In this subsec- 
tion we shall describe alternative characterizations. Aiding us in the character- 
ization is the following lemma, which can be viewed as the passband analog of 
Lemma 6.4.4 (i). 


Lemma 7.7.1. Let x be an energy-limited signal, and let f. > W/2 > 0 be given. 
Then the signal x x BPFw,s, can be expressed as 


(x *BPFw,;.)(t) = i renpemat erat teR (758) 


it is of finite energy; and its L2-Fourier Transform is (the equivalence class of) the 
mapping f + &(f)1{||f| — fe] < W/2}. 
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—fe fe 


Figure 7.15: The FT of a passband signal (top); the frequency response of a real 
filter (middle); and their product (bottom). 


Proof. The lemma follows from Lemma 6.4.4 (ii) by substituting for g the mapping 
frol{||f|— fe] < W/2}, whose IFT is BPFw,,.. 


In analogy to Proposition 6.4.5 we can characterize energy-limited passband signals 
as follows. 


Proposition 7.7.2 (Characterizations of Passband Signals in £2). 


(i) If x is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency fc, then it can be expressed in the form 


#(t) = i g(ferl df, teR, (7.60) 
\Ifl—fe| <Ww/2 
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Figure 7.16: The FT of the baseband representation of the passband signal xpp of 
Figure 7.15 (top); the frequency response with respect to the bandwidth W around 
the carrier frequency f, of the filter of Figure 7.15 (middle); and their product 


(bottom). 
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for some mapping g: f > g(f) satisfying 


i, I9(f)|? df < 00 (7.61) 
|| fl-fe 


<w/2 


that can be taken as (any function in the equivalence class of) X. 


(ii) If a signal x can be expressed as in (7.60) for some function g satisfying 
(7.61), then x is an energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f. and its FT % is (the equivalence class of ) 


the mapping f > g(f) I{|Ifl — fe] < W/2}. 


Proof. The proof of Part (i) follows from Definition 7.2.1 and from Lemma 7.7.1 in 
very much the same way as Part (i) of Proposition 6.4.5 follows from Definition 6.4.1 
and Lemma 6.4.4 (i). 


The proof of Part (ii) is analogous to the proof of Part (ii) of Proposition 6.4.5. 


As a corollary we obtain the analog of Corollary 7.2.3: 


Corollary 7.7.3 (Passband Signals Are Bandlimited). If xpp is an energy-limited 
passband signal that is bandlimited to W Hz around the carrier frequency fc, then 
it is an energy-limited signal that is bandlimited to f, + W/2 Hz. 


Proof. If xpp is an energy-limited passband signal that is bandlimited to W Hz 
around the carrier frequency f., then, by Proposition 7.7.2 (i), there exists a func- 
tion g: f +> g(f) satisfying (7.61) such that xpp is given by (7.60). But this implies 
that the function f > g(f)I{ | |f|— fe | < W/2} is an energy-limited function such 
that 


c 


fet+W/2 Lat 
xpp(t) =| wD HIlE — fe| < W/2}e?"* df, teER, (7.62) 


so, by Proposition 6.4.5 (ii), xpp is an energy-limited signal that is bandlimited to 
fe + W/2 Hz. 


The following is the analog of Proposition 6.4.6. 
Proposition 7.7.4. 


(i) If xpp is an energy-limited passband signal that is bandlimited to W Hz 
around the carrier frequency fc, then xpp is a continuous function and all 
its energy is contained in the frequencies f satisfying [Is - fel < W/2 in the 


sense that 
CO 


ltpa(f)P df = lépa(f)I? df. (7.63) 


—oo \lfl-fe|<Ww/2 
(it) If xpp € Le satisfies (7.63), then xpp is indistinguishable from the signal 
xppxBPFw,f., which is an energy-limited passband signal that is bandlimited 
to W Hz around f,.. If in addition to satisfying (7.63) the signal xpp is 


continuous, then xpp is an energy-limited passband signal that is bandlimited 
to W Hz around the carrier frequency fe. 
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Proof. This proposition’s claims are a subset of those of Proposition 7.7.5, which 
summarizes some of the results related to bandpass filtering. 


Proposition 7.7.5. Let y = xxBPF wy, be the result of feeding the signal x € Le to 
an ideal unit-gain bandpass filter of bandwidth W around the carrier frequency fe. 
Assume f. > W/2. Then: 


(i) y is energy-limited with 
lIl¥llo < IIxlle- (7.64) 


(tt) y is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency fe. 


(itt) The Lg-Fourier Transform of y is (the equivalence class of) the mapping 
fr a(f) {||| — fe] < W/2}. 


(iv) All the energy in y is concentrated in the frequencies {f : [IF — fel <Ww/2} 
in the sense that 


/ (Naf = | (NP af. 
—00 || fl-fe| <w/2 


(v) y can be represented as 


v= fone ay (7.65) 
= | &(f)e?"Ft df, tER. (7.66) 
\lfl—fe| <Ww/2 


(vi) y is uniformly continuous. 


(vii) If all the energy of x is concentrated in the frequencies { f : [If — fe| < W/2} 
in the sense that 


[rare f a(R af, (7.67) 
\lfl—fe|<w/2 


then x is indistinguishable from the passband signal x x BPFw,.. 


(vitt) z is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f. if, and only if, it satisfies all three of the following 
conditions: it is in Lo; it is continuous; and all its energy is concentrated in 


the passband frequencies { f : [If - Fé | <W/2}. 


Proof. The proof is very similar to the proof of Proposition 6.4.7 and is thus 
omitted. 
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7.7.2. The Analytic Representation 


If xpp is a real energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f-, then we define its analytic representation via (7.11). (Since 
xpp € La, it follows from Parseval’s Theorem that Xpp is energy-limited so, by 
Proposition 3.4.3, the mapping f + ¢pp(f)I{|f — fel < W/2} is integrable and 
the integral (7.11) is defined for every t € R. Also, the integral does not depend 
on which element of the equivalence class consisting of the D2-Fourier Transform 
of xpg it is applied to.) 

In analogy to Proposition 7.5.2 we can characterize the analytic representation as 
follows. 


Proposition 7.7.6 (Characterizing the Analytic Representation of xpp € Le). 
Let xpp be a real energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f.. Then each of the following statements is equivalent to the 
statement that the complex signal xa is the analytic representation of xpp: 


(a) The signal xa is given by 
for . 
ra(t) =i) épp(f)e?"* df, teR. (7.68) 


(b) The signal x, is a continuous energy-limited signal whose Lg-Fourier Trans- 
form Xa is (the equivalence class of) the mapping 


fr tpp(f) NF = Of. (7.69) 


(c) The signal xa is an energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f. and whose Lg-Fourier Transform is (the 
equivalence class of) the mapping in (7.69). 


(d) The signal xa is given by 
XA = XpB xg (7.70) 


where g: f +> g(f) ts any function in Ly Le satisfying 
gf =1,--|f— fel < W/2; (7.71a) 


and 
gf) =0, [f+ fel < W/2. (7.71b) 


Proof. The proof is not very difficult and is omitted. 


We note that the reconstruction formula (7.21b) continues to hold also when xpp 
is an energy-limited signal that is bandlimited to W Hz around the carrier fre- 


quency fe. 
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7.7.3. The Baseband Representation of xpp € Le 


Having defined the analytic representation, we now use (7.28) to define the base- 
band representation. 


As in Proposition 7.6.3, we can also describe a procedure for obtaining the base- 
band representation of a passband signal without having to go via the analytic 
representation. 


Proposition 7.7.7 (From xpp € Ly to xpp Directly). If xpp is a real energy- 
limited passband signal that is bandlimited to W Hz around the carrier frequency fe, 
then its baseband representation xpp is given by 


—i2n fet 


Xp = (t+ € tpp(t)) * 80, (7.72) 
where go: f > go(f) ts any function in L1N Le satisfying 
g(f)=1, [fl < W/2, (7.73a) 


and 


go(f)=0, |f+2fc| < W/2. (7.73b) 


Proof. The proof is very similar to the proof of Proposition 7.6.3 and is omitted. 


The following proposition, which is the analog of Proposition 7.6.5 characterizes 
the baseband representation of energy-limited passband signals. 


Proposition 7.7.8 (Characterizing the Baseband Representation of xpp € La). 
Let xpp be a real energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f-.. Then each of the following statements is equivalent to the 
statement that the complex signal xpp is the baseband representation of xpp. 


(a) The signal xpp is given by 


Upp(t) = / 7 ipp(f + Fen ernst df, teER. (7.74) 


2 


(b) The signal xgp is a continuous energy-limited signal whose L2-Fourier Trans- 
form is (the equivalence class of) the mapping 


fro fppl(f + fe) HIF] < W/2}. (7.75) 


(c) The signal xpp is an energy-limited signal that is bandlimited to W/2 Hz 
and whose Lg-Fourier Transform is (the equivalence class of) the mapping 
(7.75). 


(d) The signal xpp is given by (7.72) for any mapping go: f +> go(f) satisfying 
(7.73). 
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The in-phase component and the quadrature component of an energy-limited 
passband signal are defined, as in the integrable case, as the real and imaginary 
parts of its baseband representation. 


Proposition 7.6.7, which asserts that the bandwidth of xgp is half the bandwidth 
of xpp continues to hold, as does the reconstruction formula (7.42b). Proposi- 
tion 7.6.9 also extends to energy-limited signals. We repeat it (in a slightly more 
general way) for future reference. 


Proposition 7.7.9. 


(i) If z is an energy-limited signal that is bandlimited to W/2 Hz, and if the 
signal x is given by 


a(t) =2Re(2(t)e?"*'), tER, (7.76) 


where f. > W/2, then x is a real energy-limited passband signal that is band- 
limited to W Hz around f., and z is its baseband representation. 


(ii) If x is an energy-limited passband signal that is bandlimited to W Hz around 
the carrier frequency f. and if (7.76) holds for some energy-limited signal z 
that is bandlimited to f. Hz, then z is the baseband representation of x and 
is, in fact, bandlimited to W/2 Hz. 


Proof. Omitted. 


Identity (7.50) relating the inner products (xpg, ypg) and (xpp, yep) continues to 
hold for energy-limited passband signals that are not necessarily integrable. 


Proposition 7.6.12 does not hold for energy-limited signals, because the convolution 
of two energy-limited signals need not be energy-limited. But if we assume that at 
least one of the signals is also integrable, then things sail through. Consequently, 
using Corollary 7.2.4 we obtain: 


Proposition 7.7.10 (The Baseband Representation of xpp * ypp Is Xpp * ypp)- 
Let xpp be a real integrable passband signal that is bandlimited to W Hz around 
the carrier frequency f., and let ypp be a real energy-limited passband signal that 
is bandlimited to W Hz around the carrier frequency f.. Let xpp and ypp be their 
corresponding baseband representations. Then xpp * ypp is a real energy-limited 
signal that is bandlimited to W Hz around the carrier frequency f, and whose 
baseband representation is XBB * YBB- 


Proposition 7.6.13 too requires only a slight modification to address energy-limited 
signals. 


Proposition 7.7.11 (Baseband Representation of xpp xh). Let xpp be a real 
energy-limited passband signal that is bandlimited to W Hz around the carrier fre- 
quency f., and let h be a real integrable signal. Then xpp xh is defined at every 
time instant; it is a real energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f.; and its baseband representation is given by 


(h * XPB) pp = hbp * XBB, (7.77) 


138 Passband Signals and Their Representation 


where hpp is the baseband representation of the energy-limited signal hx BPFw,,.. 
The Lg-Fourier Transform of the baseband representation of xpp xh is (the equiv- 
alence class of) the mapping 


froapa(fh(ft+ fe), feER, (7.78) 


where Xpp is the baseband representation of Xpp. 


The following theorem summarizes some of the properties of the baseband repre- 
sentation of energy-limited passband signals. 


Theorem 7.7.12 (Properties of the Baseband Representation). 


(i) The mapping xpp > Xpp that maps every real energy-limited passband signal 
that is bandlimited to W Hz around the carrier frequency f. to its baseband 
representation is a one-to-one mapping onto the space of complex energy- 
limited signals that are bandlimited to W/2 Hz. 


(it) The mapping xpp +> Xpp its linear in the sense that if xpp and ypp are 
real energy-limited passband signals that are bandlimited to W Hz around 
the carrier frequency f., and if Xpp and ypp are their corresponding base- 
band representations, then for every a, 3 € R, the baseband representation of 
axpp + bypp is aXpp + BypB: 


(oxpp + SyPB) p_= OXBB+ yp, a, ER. (7.79) 


(iti) The mapping xpp > Xpp is—to within a factor of two—energy preserving 
in the sense that 
2 2 
IIxpBlla = 2 ||xpplla - (7.80) 


(iv) Inner products are related via 
(xpB, ypB) = 2 Re((xps, yBs)), (7.81) 


for xpp and ypp as above. 


(v) The (baseband) bandwidth of xgp is half the bandwidth of xpp around the 


carrier frequency fe. 
(vi) The baseband representation xpp can be expressed in terms of Xpp as 
xpp = (tre?! rpp(t)) * LPFw, (7.82a) 
where W,. is any cutoff frequency satisfying 


W/2 < We < 2fe — W/2. (7.82b) 


(vit) The real passband signal xpp can be expressed in terms of its baseband rep- 
resentation XBR as 


tpp(t) = 2Re(epp(t)e"*"), teER. (7.83) 
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(viti) Ifh is a real integrable signal, and if xpp is as above, then hx xpp is a real 
energy-limited passband signal that is bandlimited to W Hz around the carrier 
frequency f., and its baseband representation is given by 


(h * XPB) pp = hp * XBB; (7.84) 


where hp is the baseband representation of the energy-limited real signal 
hx BPFw,,,- 


7.8 Shifting to Passband and Convolving 


The following result is almost trivial if you think about its interpretation in the 
frequency domain. To that end, it is good to focus on the case where the signal x 
is a bandlimited baseband signal and where f, is positive and large. In this case 
we can interpret the LHS of (7.85) as the result of taking the baseband signal x, 
up-converting it to passband by forming the signal tT + a(r) e!?"fe7, and then 
convolving the result with h. The RHS corresponds to down-converting h to form 
the signal tT +> e7?"fceT h(r), then convolving this signal with x, and then up- 
converting the final result. 


Proposition 7.8.1. Suppose that f. € R and that (at least) one of the following 
conditions holds: 


1) The signal x is a measurable bounded signal andh € Ly. 


2) Both x andh are in Lo. 
Then, at every epoch t € R, 
((r Hs x(7) e?™Fe7) h) (t) = el? fet (x 2 a ei n(7))) (t). (7.85) 


Proof. We evaluate the LHS of (7.85) using the definition of the convolution: 


/ i a(r) e?™FeT h(t — 7) dr 


—Co 


((7 + a(r) ether) x h) (z) 
= ei2t fet ery a(T) eet fer h(t 4 T) dr 


= @?rfet ] x(T) e727 felt—7) h(t — 7) dr 


= rll (xx (7 Pr h(7))) (2). 


7.9 Mathematical Comments 


The analytic representation is related to the Hilbert Transform; see, for example, 
(Pinsky, 2002, Section 3.4). In our proof that x, is integrable whenever xpp is 
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integrable we implicitly exploited the fact that the strict inequality f. > W/2 
implies that for the class of integrable passband signals that are bandlimited to W 
Hz around the carrier frequency f, there exist Hilbert Transform kernels that are 
integrable. See, for example, (Logan, 1978, Section 2.5). 


7.10 Exercises 


Exercise 7.1 (Purely Real and Purely Imaginary Baseband Representations). Let xpp 
be a real integrable passband signal that is bandlimited to W Hz around the carrier 
frequency fc, and let xpp be its baseband representation. 


(i) Show that xpp is real if, and only if, Xpp satisfies 


“ a W 
fpa(fe— 6) = &pp(fet+5), 4] < af 

(ii) Show that xpp is imaginary if, and only if, 
W 


&pa(fe — 6) =—&bp(fe +5), 6) < —. 


bo 


Exercise 7.2 (Symmetry around the Carrier Frequency). Let xpp be a real integrable 
passband signal that is bandlimited to W Hz around the carrier frequency fc. 


(i) Show that xpp can be written in the form 
xpp(t) = w(t) cos(27 fet) 
where w(-) is a real integrable signal that is bandlimited to W/2 Hz if, and only if, 


Ses ee eee Hele v 


(ii) Show that xpp can be written in the form 
xpp(t) = w(t) sin(27f-t), tER 
for w(-) as above if, and only if, 


Redtpa he oeeste le ate a 


Exercise 7.3 (Viewing a Baseband Signal as a Passband Signal). Let x be a real integrable 
signal that is bandlimited to W Hz. Show that if we had informally allowed equality in 
(7.1b) and if we had allowed equality between f, and W/2 in (5.21), then we could have 
viewed x also as a real integrable passband signal that is bandlimited to W Hz around 
the carrier frequency f. = W/2. Viewed as such, what would have been its complex 
baseband representation? 


Exercise 7.4 (Bandwidth of the Product of Two Signals). Let x be a real energy-limited 
signal that is bandlimited to W, Hz. Let y be a real energy-limited passband signal that 
is bandlimited to W, Hz around the carrier frequency f-. Show that if fe > We +W,/2, 
then the signal t +> x(t) y(t) is a real integrable passband signal that is bandlimited to 
2W, + W, Hz around the carrier frequency fe. 
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Exercise 7.5 (Phase Shift). Let x be a real integrable signal that is bandlimited to W Hz. 
Let fc. be larger than W. 


(i) Express the baseband representation of the real passband signal 
zpp(t) = x(t) sin(2rf-t+¢), teER 
in terms of x(-) and ¢. 


(ii) Compute the Fourier Transform of zpp. 


2 
2° 


Exercise 7.6 (Energy of a Passband Signal). Let x € Lo be of energy ||x 


(i) What is the approximate energy in t+> x(t) cos(27fct) if fc is very large? 


(ii) Is your answer exact if x(-) is an energy-limited signal that is bandlimited to W Hz, 
where W < f.? 


Hint: In Part (i) approximate x as being constant over the periods of t +> cos (2rfet). 
For Part (ii) see also Problem 6.18. 


Exercise 7.7 (Differences in Passband). Let xpp and ypp be real energy-limited passband 
signals that are bandlimited to W Hz around the carrier frequency f.. Let xgp and ypp 
be their baseband representations. Find the relationship between 


. (xpx(t) — ypp(t))” dt and i |zpp(t) — ypn(t)|” dt. 


—oo —oco 


Exercise 7.8 (Reflection of Passband Signal). Let xpp and ypp be real integrable pass- 
band signals that are bandlimited to W Hz around the carrier frequency f-. Let xpp 
and yep be their baseband representations. 


(i) Express the baseband representation of Xpp in terms of xpp. 


(ii) Express (xpp,¥pp) in terms of xpp and ygp. 


Exercise 7.9 (Deducing xgg). Let xpp be a real integrable passband signal that is band- 
limited to W Hz around the carrier frequency f,. Show that it is possible that xpp(t) be 
given at every epoch t € R by 2 Re(z(t)e?"/") for some complex signal z(t) and that z 
not be the baseband representation of xpp. Does this contradict Proposition 7.6.9? 
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In terms of xpp 


In terms of xa 


In terms of xpp 


XPB 
xpp * (tb e?*Fet LPF, (t)) 


(t re 2t fet rpp(t)) * LPFy, 


2 Re(xa) 


tr 2Re(zpp(t) elitist) 
tro ei2t fet XLBB (t) 


XBB 


XPB fra(f)+2,a(-f) | ft fenl(f — fe) + 2bn(-f — fe) 
f > &pp(f) I{|f — fel < We} XA ff &pp(f — fe) 
fr épp(f + fe) I{|f| < We} fr&a(f t+ fe) XBB 
BW of xpg around f. BW of xa around f, 2 x BW of xpp 
1 1 
3 x BW of xpp around f, 3 x BW of xa around f, BW of xpp 
2 2 2 
I|xpplly 2 |xally 2 |xpBllo 
1 2 2 2 
5 Ixpslle Ilxall2 I|xpplls 
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Table 7.1: Table relating properties of a real integrable passband signal xpp that is bandlimited to W Hz around the carrier 
frequency f. to those of its analytic representation xa and its baseband representation xpp. Same-row entries are equal. The cutoff 
frequency W, is assumed to be in the range W/2 < W. < 2f. —W/2, and BW stands for bandwidth. The transformation from xpp 
to xa is based on Proposition 7.5.2 with the function g in (d) being chosen as the mapping f +> I{|f — fc| < We}. 


Chapter 8 


Complete Orthonormal Systems and the 
Sampling Theorem 


8.1 Introduction 


Like Chapter 4, this chapter deals with the geometry of the space Lg of energy- 
limited signals. Here, however, our focus is on infinite-dimensional linear subspaces 
of £2 and on the notion of a complete orthonormal system (CONS). As an 
application of this geometric picture, we shall present the Sampling Theorem as 
an orthonormal expansion with respect to a CONS for the space of energy-limited 
signals that are bandlimited to W Hz. 


8.2 Complete Orthonormal System 


Recall that we denote by £2 the space of all measurable signals u: R — C satisfying 


i |u(t)|? dt < oo. 


—oco 


Also recall from Section 4.3 that a subset U/ of £2 is said to be a linear subspace of 
Leg if U is nonempty and if the signal au, + Guz is in U whenever u;,u2 € U and 
a, €C. A linear subspace is said to be finite-dimensional if there exists a finite 
number of signals that span it; otherwise, it is said to be infinite-dimensional. The 
following are some examples of infinite-dimensional linear subspaces of Lo. 


(i) The set of all functions of the form t +> p(t) e~!"!, where p(t) is any polynomial 
(of arbitrary degree). 


(ii) The set of all energy-limited signals that vanish outside the interval [—1, 1] 
(i.e., that map every t outside this interval to zero). 


(iii) The set of all energy-limited signals that vanish outside some unspecified 
finite interval (i.e., the set containing all signals u for which there exists 
some a,b € R (depending on u) such that u(t) = 0 whenever ¢ ¢ [a, }]). 
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(iv) The set of all energy-limited signals that are bandlimited to W Hz. 


While a basis for an infinite-dimensional subspace can be defined,! this notion does 
not turn out to be very useful for our purposes. Much more useful to us is the 
notion of a complete orthonormal system, which we shall define shortly.” 


To motivate the definition, consider a bi-infinite sequence ...,@_1, do, O1, 2,--- 
in Le satisfying the orthonormality condition 
(de, ov) = I{é = er, Le E Z, (8.1) 
and let u be an arbitrary element of £2. Define the signals 
L 
ur= S° (ude) ge L=1,2,... (8.2) 
é=-L 


By Note 4.6.7, ut is the projection of the vector u onto the subspace spanned 
by (@_1,---,@L). By the orthonormality (8.1), the tuple (@_1,...,@ L) is an 
orthonormal basis for this subspace. Consequently, by Proposition 4.6.9, 


tg 
lulls => S> |(a, 60)? 


l=-L 


Se satis (8.3) 


with equality if, and only if, u is indistinguishable from some linear combination 
of (o_1, ee or): This motivates us to explore the situation where (8.3) holds 
with equality when L — oo and to hope that it corresponds to u being—in some 
sense that needs to be made precise—indistinguishable from a limit of finite linear 
combinations of ...,@_1, 60, 61,--- 


Definition 8.2.1 (Complete Orthonormal System). A bi-infinite sequence of sig- 
nals ...,@_1,00, 01,-.-. is said to form a complete orthonormal system or a 
CONS for the linear subspace U of Lg if all three of the following conditions hold: 


1) Each element of the sequence is inU 
gecu, LEZ. (8.4) 
2) The sequence satisfies the orthonormality condition 
(be, oe) == C0}, £e EZ. (8.5) 


3) For everyu EU we have 


lull = So |(u,ge)/?, wew. (8.6) 


L=—0o 


1A basis for a subspace is defined as a collection of functions such that any function in 
the subspace can be represented as a linear combination of a finite number of elements in the 
collection. More useful to us will be the notion of a complete orthonormal system. From a 
complete orthonormal system we only require that each function can be approximated by a linear 
combination of a finite number of functions in the system. 

?Mathematicians usually define a CONS only for closed subspaces. Such subspaces are 
discussed in Section 8.5. 
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The following proposition considers equivalent definitions of a CONS and demon- 
strates that if {@¢} is a CONS for U/, then, indeed, every element of U/ can be 
approximated by a finite linear combination of the functions {@¢}. 


Proposition 8.2.2. Let U be a subspace of Lo and let the bi-infinite sequence 
..,;P_2, 6-1, 00, 1,... satisfy (8.4) & (8.5). Then each of the following con- 
ditions on {de} is equivalent to the condition that {oc} forms a CONS for U: 


(a) For everyu €U and every € > 0 there exists some positive integer L(e) and 


coefficients A_L(c))-++,L(c) € C such that 
L(«) 
u— Se aghe <e€. (8.7) 
l=-L(e) 7 
(b) For everyu Eu 
L 
lim |ju— S$” (u, $2) bel] = 0. (8.8) 
CO l=_L 2 
(c) For everyu Eu 
= 2 
m= S> [ado (8.9) 
lL=—0o 
(d) For every u,v €U 
(u,v) = S> (ude) (v, be). (8.10) 
lL=—0o 


Proof. Since (8.4) & (8.5) hold (by hypothesis), it follows that the additional 
condition (c) is, by Definition 8.2.1, equivalent to {@¢} being a CONS. It thus only 
remains to show that the four conditions are equivalent. We shall prove this by 
showing that (a) = (b); that (b) = (c); and that (c) = (d). 


That (b) implies (a) is obvious because nothing precludes us from choosing ag in 
(8.7) to be (u, de). That (a) implies (b) follows because, by Note 4.6.7, the signal 


L 


S- (u, oa) de, 


f=-L 


which we denoted in (8.2) by uz, is the projection of u onto the linear subspace 
spanned by (@_,,...,@ L) and as such, by Proposition 4.6.8, best approximates u 
among all the signals in that subspace. Consequently, replacing ag by (u, dz) can 
only reduce the LHS of (8.7). 


To prove (b) => (c) we first note that by letting L tend to infinity in (8.3) it follows 
that 


lulls > D2 [uge))’, we Le, (8.11) 


L=—0o 
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so to establish (c) we only need to show that if u is in U/ then I|ullz is also upper- 
bounded by the RHS of (8.11). To that end we first upper-bound ||ul|, as 


L L 
ul, =|[(a- So dior) + > ude 

@=—L @=—L 2 
L L 

<JJu— So (u, be) del] + |) SS (a, be) be 
é=—L 2 0=-L 2 
L L 3 1/2 

=Ju- XO won] + (Dimon). wece, 9) 
l=—L 2 l=—L 


where the first equality follows by adding and subtracting a term; the subsequent in- 
equality by the Triangle Inequality (Proposition 3.4.1); and the final equality by the 
orthonormality assumption (8.5) and the Pythagorean Theorem (Theorem 4.5.2). 
If Condition (b) holds and if u is in U/, then the RHS of (8.12) converges to the 
square root of the infinite sum }*/.7|(u, @¢)|? and thus gives us the desired upper 
bound on ||ul| 5. 


We next prove (c) => (b). We assume that (c) holds and that u is in Y/ and set out 
to prove (8.8). To that end we first note that by the basic properties of the inner 
product (3.6)—(3.10) and by the orthonormality (8.1) it follows that 


iE 


(a= S> (ud) dr.der) = (ube) He] >U, (€€2, we £2). 


f=-L 


u’ 


Consequently, if we apply (c) to the under-braced signal u’ (which for u € U is 
also in U/) we obtain that (c) implies 


L 


u— S- (u, be) de 


é=-L 


= 2 |(u, ¢)|? 


2 |el>b 


, uel 


But by applying (c) to u we infer that the RHS of the above tends to zero as L 
tends to infinity, thus establishing (8.8) and hence (b). 


We next prove (c) = (d). The implication (d) = (c) is obvious because we can 
always choose v to be equal to u. We consequently focus on proving (c) = (d). 
We do so by assuming that u,v € U and calculating for every @ € C 


|B)? |lullZ + 2Re(B(u, v)) + |lvllZ 
= ||/Gu+vl3 
S- | (Gu + v, be)|” 


L=—0o 


= S~ |B(u, be) + (v, 60)” 


L=—0o 
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=|8/? > |(u, ¢0)|’ +are(3 S> (u, be) (vs 60)") 


L=—0o L=—0o 


+ Se, I(v, ge) 


L=—0o 


* (uvew, Bec), (8.13) 


where the first equality follows by writing ||Gu + v5 as (Gu + v, Bu + v) and using 
the basic properties of the inner product (3.6)—(3.10); the second by applying (c) 
to Gu+v (which for u,v € U is also in UY); the third by the basic properties of 
the inner product; and the final equality by writing the squared magnitude of a 
complex number as its product by its conjugate. By applying (c) to u and by 
applying (c) to v we now obtain from (8.13) that 


2 Re(3(u, v)) =2re(s oe 0) (v, be)" i (u,veu, BEC), 


which can only hold for all 6 € C (and in particular for both G = 1 and @ =i) if 


Co 


(u,v) = S© (u,dr)(v, ge)", uv eu, 


lL=—0o 


thus establishing (d). 


We next describe the two complete orthonormal systems that will be of most in- 
terest to us. 


8.3. The Fourier Series 


A CONS that you have probably already encountered is the one underlying the 
Fourier Series representation. You may have encountered the Fourier Series in the 
context of periodic functions, but we shall focus on a slightly different view. 


Proposition 8.3.1. For every T> 0, the functions {@¢} defined for every integer ¢ 
by 
1 inet /T 
th ame Tt] <T 8.14 
pe Ti {l¢] < T} (8.14) 


form a CONS for the subspace 
{u € Ly : u(t) =0 whenever |t| > T} 


of energy-limited signals that vanish outside the interval [—T,T]. 


Proof. Follows from Theorem A.3.3 in the appendix by substituting 2T for S. 


Notice that in this case 


(u, de) = se [. u(t) eWi™/T at (8.15) 
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is the &th Fourier Series Coefficient of u; see Note A.3.5 in the appendix with 2T 
substituted for S. 

Note 8.3.2. The dummy argument t is immaterial in Proposition 8.3.1. Indeed, if 
we define for W > 0 the linear subspace 


V = {g € Le: g(f) =0 whenever |f| > W}, (8.16) 
then the functions defined for every integer ¢ by 
Lime f/W 
rH ——e'™ I <W 8.17 
i Jaw {| f] < W} (8.17) 


form a CONS for this subspace. 


This note will be crucial when we next discuss a CONS for the space of energy- 
limited signals that are bandlimited to W Hz. 


8.4 The Sampling Theorem 


We next provide a CONS for the space of energy-limited signals that are band- 
limited to W Hz. Recall that if x is an energy-limited signal that is bandlimited 
to W Hz, then there exists a measurable function? g: f + g(f) satisfying 


Af) =0, |fl>W (8.18) 

and sy 
/ la(f)P af < 00, (8.19) 

—W 

such that 
x(t) =| afer" df. ter. (8.20) 

—W 


Conversely, if g is any function satisfying (8.18) & (8.19), and if we define x via 
(8.20) as the Inverse Fourier Transform of g, then x is an energy-limited signal that 
is bandlimited to W Hz and its L2-Fourier Transform X is equal to (the equivalence 
class of) g. 

Thus, if, as in (8.16), we denote by Y the set of all functions (of frequency) satisfying 


(8.18) & (8.19), then the set of all energy-limited signals that are bandlimited to W 
Hz is just the image of V under the IFT, i.e., it is the set V, where 


D4 {e:geV}. (8.21) 


By the Mini Parseval Theorem (Proposition 6.2.6 (i)), if x; and x2 are given by 
g, and go, where g1, go are in V, then 


(X1,X2) = (81,82), (8.22) 


3Loosely speaking, this function is the Fourier Transform of x. But since x is not necessarily 
integrable, its FT X is an equivalence class of signals. Thus, more precisely, the equivalence class 
of g is the Lg-Fourier Transform of x. Or, stated differently, g can be any one of the signals in 
the equivalence class of X that is zero outside the interval [—W, W]. 
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(1, S2) = (21, 2) ’ £1,882 E Vy. (8.23) 
The following lemma is a simple but very useful consequence of (8.23). 
Lemma 8.4.1. If {yc} is a CONS for the subspace V, which is defined in (8.16), 
then {ae} is a CONS for the subspace V, which is defined in (8.21). 


Proof. Let {¢,} be a CONS for the subspace V. By (8.23), 

(de, be) = (bee), 2 EZ, 
so our assumption that {¢,} is a CONS for V (and hence that, a fortiori, it satisfies 
(abe, Yer) = 1{f = £’} for all ¢, ’ € Z) implies that 

(hee) =Yl=0}, 20 €Z. 


It remains to verify that for every x € V 


S~ |¢x, be)|? = [Ixll3 - 


L=—0o 


Equivalently, since every x € V can be written as g for some g € V, we need to 
show that 


S- |e, de)|? =llalg, ge. 
L=—00 


This follows from (8.23) and from our assumption that {we} is a CONS for V 
because 


S- |(&, be)|° = S- (eg, ve) |” 
L=—0co lL=—0o 

= |lell3 

=|lel3, e€y, 


where the first equality follows from (8.23) (by substituting g for g; and by sub- 
stituting uw, for go); the second from the assumption that {w,} is a CONS for V; 
and the final equality from (8.23) (by substituting g for g, and for go). 


Using this lemma and Note 8.3.2 we now derive a CONS for the subspace V of 
energy-limited signals that are bandlimited to W Hz. 


Proposition 8.4.2 (A CONS for the Subspace of Energy-Limited Signals that 
Are Bandlimited to W Hz). 


(i) The sequence of signals that are defined for every integer € by 
tre V2Wsinc(2Wt + £) (8.24) 


forms a CONS for the space of energy-limited signals that are bandlimited 
to W Hz. 
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(tt) If x is an energy-limited signal that is bandlimited to W Hz, then its inner 
product with the €-th signal is given by its scaled sample at time —¢/(2W): 


(x,t-+ V2Wsine(2W1 + 0)) = —— 2 ) 2eZ. (8.25) 


Proof. To prove Part (i) we recall that, by Note 8.3.2, the functions defined for 
every  € Z by 


DEW. 
: free" I <W 8.26 
form a CONS for the subspace VY. Consequently, by Lemma 8.4.1, their Inverse 


Fourier Transforms {,} form a CONS for VY. It just remains to evaluate wp 
explicitly in order to verify that it is a scaled shifted sinc(-): 


by =f ” abel f) €*S* af 


w 
i imef /W i2n ft 
= —— _e” eT’ d 8.27 
he V2W i ( ) 
= V2Wsinc(2Wé + 2), (8.28) 


where the last calculation can be verified by direct computation as in (6.35). 


We next prove Part (ii). Since x is an energy-limited signal that is bandlimited 
to W Hz, it follows that there exists some g € V such that 


x=&, (8.29) 
i.e. 
Ww . 
a= i gfe" df, teR. (8.30) 
—W 
Consequently, 


(as > V2Wsinc(2Wt + 0) = (x, yr) 


(g 
(g 
=f n(Ayer) ar 
—Ww 2W 


Ww . 
= sam fhe df 
W 


- Fge(-sh). €02 


where the first equality follows from (8.28); the second by (8.29); the third by (8.23) 
(with the substitution of g for g, and wy, for ge); the fourth by the definition of 
the inner product and by (8.26); the fifth by conjugating the complex exponential; 
and the final equality by substituting —0/(2W) for ¢ in (8.30). 
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Using Proposition 8.4.2 and Proposition 8.2.2 we obtain the following £2 version 
of the Sampling Theorem. 


Theorem 8.4.3 (£2-Sampling Theorem). Let x be an energy-limited signal that 
is bandlimited to W Hz, where W > 0, and let 
ee (8.31) 
— 2W 
(i) The signal x can be reconstructed from the sequence ...,x(—T),2(0), x(T),... 
of its values at integer multiples of T in the sense that 


oo L 


2 x(t) — S- x(—£T) sine(E+0)| at =0. 


f=-L 


lim 
Loo J_ 


(it) The signal’s energy can be reconstructed from its samples via the relation 


J e@Par=t > joene. 


L=—0o 


(itt) If y is another energy-limited signal that is bandlimited to W Hz, then 


asd (eT) y*(é 


L=—0o 


Note 8.4.4. If T < 1/(2W), then any energy-limited signal x that is bandlimited 
to W Hz is also bandlimited to 1/(2T) Hz. Consequently, Theorem 8.4.3 continues 
to hold if we replace (8.31) with the condition 


1 
0<T< aw" (8.32) 


Table 8.1 highlights the duality between the Sampling Theorem and the Fourier 
Series. 


We also mention here without proof a version of the Sampling Theorem that allows 
one to reconstruct the signal pointwise, i.e., at every epoch t. Thus, while Theo- 
rem 8.4.3 guarantees that, as more and more terms in the sum of the shifted sinc 
functions are added, the energy in the error function tends to zero, the following 
theorem demonstrates that at every fixed time t the error tends to zero. 


Theorem 8.4.5 (Pointwise Sampling Theorem). /f the signal x can be represented 
as 


w 2: 
say i emt df, ter (8.33) 


for some function g satisfying 


W 
/ la(fldf < co, (8.34) 
—W 
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and if0 <T<1/(2W), then for everyt ER 


L-o0o 
l=—-L 


L 
a(t) = lim S~ 2(—€T) sinc (G+¢) (8.35) 


Proof. See (Pinsky, 2002, Chapter 4, Section 4.2.3, Theorem 4.2.13). O 


The Sampling Theorem goes by various names. It is sometimes attributed to 
Claude Elwood Shannon (1916-2001), the founder of Information Theory. But 
it also appears in the works of Vladimir Aleksandrovich Kotelnikov (1908-2005), 
Harry Nyquist (1889-1976), and Edmund Taylor Whittaker (1873-1956). For fur- 
ther references regarding the history of this result and for a survey of many related 
results, see (Unser, 2000). 


8.5 Closed Subspaces of L» 


Our definition of a CONS for a subspace UY is not quite standard, because we only 
assumed that U is a linear subspace; we did not assume that U is closed. In this 
section we shall define closed linear subspaces and derive a condition for a sequence 
{dc} to form a CONS for a closed subspace U. (The set of energy-limited signals 
that vanish outside the interval [—T,T] is closed, as is the class of energy-limited 
signals that are bandlimited to W Hz.) 

Before proceeding to define closed linear subspaces, we pause here to recall that 
the space Ly is complete.* 


Theorem 8.5.1 (£2 Is Complete). Jf the sequence uj, U2,... of signals in Lo is 
such that for any € > 0 there exists a positive integer L(e) such that 


[Un —Umlle<€, n,m > L(e), 
then there exists some function u € Ly» such that 


lim ||u—u,||, =0. 
n—-co 


Proof. See, for example, (Rudin, 1974, Chapter 3, Theorem 3.11). 


Definition 8.5.2 (Closed Subspace). A linear subspace U of Lo is said to be 
closed if for any sequence of signals Uy, U2,... inU and any u € Lo, the condition 
ju —u,,||, — 0 implies that u is indistinguishable from some element of U. 


Before stating the next theorem we remind the reader that a bi-infinite sequence 


of complex numbers ...,@_1,Q@0,Q@1,... is said to be square summable if 
co 
dD lel" <0 
L=—0o 


4This property is usually stated about Lg but we prefer to work with Lo. 
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Theorem 8.5.3 (Riesz-Fischer). Let U be a closed linear subspace of Lg, and let 
the bi-infinite sequence ...,b-1, bo, Oi,... satisfy (8.4) & (8.5). Let the bi-infinite 
sequence of complex numbers ...,@_1,Q0,Q1,... be square summable. Then there 
exists an element u inU satisfying 


L 
Jim ju — S> ave] = 0; (8.36a) 
l=-L 2 
(u, de) =a0, CED; (8.36b) 
and 
2 
lulls = S> Jael’. (8.36c) 
l=—oo 
Proof. Define for every positive integer L 
L 
ur= >> ade, LEN. (8.37) 
(=-L 


Since, by hypothesis, / is a linear subspace and the signals {@,} are all in U/, it fol- 
lows that ut € U. By the orthonormality assumption (8.5) and by the Pythagorean 
Theorem (Theorem 4.5.2), it follows that 


[Un — ta [2 = S- Jer]? 


min{m,n}<|é|<max{m,n} 
2 
< S- | ae| , n,meN. 


min{m,n}<|l|<oo 


From this and from the square summability of {a¢}, it follows that for any « > 0 
we have that ||u, — Um||, is smaller than € whenever both n and m are sufficiently 
large. By the completeness of £yg it thus follows that there exists some u’ € Lg 
such that 


jim ju’ — ur||, = 0. (8.38) 


Since U/ is closed, and since uy is in U for every L EN, it follows from (8.38) that u’ 
is indistinguishable from some element u of U: 


Ju —u'||, =0. (8.39) 
It now follows from (8.38) and (8.39) that 
jim Ju — ur||, = 0, (8.40) 


as can be verified using (4.14) (with the substitution (u’ — uy) for x and (u— wv’) 
for y). Combining (8.40) with (8.37) establishes (8.36a). 
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To establish (8.36b) we use (8.40) and the continuity of the inner product (Propo- 
sition 3.4.2) to calculate (u, @¢) for every fixed ¢ € Z as follows: 


(u, de) = jim, (ur, de) 


L 
lim ( SS avdu. dr) 


L-o0o 
=—-L 


- jim ag I{|e| < L} 


I 


=ar, LEZ, 


where the first equality follows from (8.40) and from the continuity of the inner 
product (Proposition 3.4.2); the second by (8.37); the third by the orthonormality 
(8.5); and the final equality because a¢I{|é| < L} is equal to ag, whenever L is 
large enough (i.e., exceeds |f|). 


It remains to prove (8.36c). By the orthonormality of {@¢} and the Pythagorean 
Theorem (Theorem 4.5.2) 


2 ee, (8.41) 


L 
2 
Jacl = > Joe 


f=—L 


Also, by (4.14) (with the substitution of u for x and of (uy — u) for y) we obtain 
lull, — lu—urll, < llurll, < |lull, + Ju — utile - (8.42) 
It now follows from (8.42), (8.40), and the Sandwich Theorem? that 


Jim full = [hulle (8.43) 


which combines with (8.41) to prove (8.36c). 


By applying Theorem 8.5.3 to the space of energy-limited signals that are band- 
limited to W Hz and to the CONS that we derived for that space in Proposi- 
tion 8.4.2 we obtain: 


Proposition 8.5.4. Any square-summable bi-infinite sequence of complex numbers 
corresponds to the samples at integer multiples of T of an energy-limited signal that 
is bandlimited to 1/(2T) Hz. Here T > 0 is arbitrary. 


Proof. Let ...,G_1, G0, G1,... be a square-summable bi-infinite sequence of com- 
plex numbers, and let W = 1/(2T). We seek a signal u that is an energy-limited 
signal that is bandlimited to W Hz and whose samples are given by u(£T) = (ie, 
for every integer ¢. Since the set of all energy-limited signals that are bandlimited 
to W Hz is a closed linear subspace of Ly, and since the sequence {4} (given ex- 
plicitly in (8.28) as ae: t+ V2Wsinc(2W#t-+ £)) is an orthonormal sequence in that 


5The Sandwich Theorem states that if the sequences of real number {an}, {bn} and {cn} are 
such that bn < an < cn for every n, and if the sequences {bn} and {cn} converge to the same 
limit, then {an} also converges to that limit. 
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subspace, it follows from Theorem 8.5.3 (with the substitution of abe for de and of 
GB_2/V2W for ae) that there exists an energy-limited signal u that is bandlimited 
to W Hz and for which 


. 1 
(u, we) = Jaw ** LEZ. (8.44) 
By Proposition 8.4.2, 
+ 1 
(u, we) = Payne LEZ, (8.45) 


so by (8.44) and (8.45) 


u(—€1) = Be, EZ. 


We now give an alternative characterization of a CONS for a closed subspace of Lo. 
This result will not be used later in the book. 


Proposition 8.5.5 (Characterization of a CONS for a Closed Subspace). 


(i) If the bi-infinite sequence {ge} is a CONS for the linear subspace U C Lo, 
then an element of U whose inner product with d, is zero for every integer 
must have zero energy: 


((u, de) =r) teZ) as, (lull, =0), uel. (8.46) 


(it) If U is a closed subspace of Le and if the bi-infinite sequence {de} satisfies 
(8.4) & (8.5), then Condition (8.46) is equivalent to the condition that {de} 
forms a CONS for U. 


Proof. We begin by proving Part (i). By definition, if {@¢} is a CONS for U, then 
(8.6) must hold for every every u € U. Consequently, if for some u € U we have 
that (u, @¢) is zero for all ¢ € Z, then the RHS of (8.6) is zero and hence the LHS 
must also be zero, thus showing that u must be of zero energy. 


We next turn to Part (ii) and assume that U is closed and that the bi-infinite 
sequence {de} satisfies (8.4) & (8.5). That the condition that {dc} is a CONS 
implies Condition (8.46) follows from Part (i). It thus remains to show that if 
Condition (8.46) holds, then {@¢} is a CONS. To prove this we now assume that U 
is a closed subspace; that {@¢} satisfies (8.4) & (8.5); and that (8.46) holds and 
set out to prove that 


luis = S2 |(u@e)|’, we. (8.47) 


L=—0o 


To establish (8.47) fix some arbitrary u € U. Since U C Lo, the fact that u is 
in UY implies that it is of finite energy, which combines with (8.3) to imply that the 
bi-infinite sequence ...,(u, @-1) , (u, do) , (u, @1),... is square summable. Since, 
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by hypothesis, U is closed, this implies, by Theorem 8.5.3 (with the substitution 
of (u, dc) for ae), that there exists some element u € U such that 


L 
jim ja - a (u, $e) o1|[, = 0; (8.48a) 
(a, pe) = (u, ge), EZ; (8.48b) 
and * 
alls = S> |(u, pe)’. (8.48c) 


L=—0co 


By (8.48b) it follows that the element u— t of U satisfies 
(u—t,¢e)=0, LEZ, 
and hence, by Condition (8.46), is of zero energy 
ju — all, = 0, (8.49) 
so u and wu are indistinguishable and hence 


lull = [lulls - 


This combines with (8.48c) to prove (8.47). 


8.6 An lsomorphism 


In this section we collect the results of Theorem 8.4.3 and Proposition 8.5.4 into a 
single theorem about the isomorphism between the space of energy-limited signals 
that are bandlimited to W Hz and the space of square-summable sequences. This 
theorem is at the heart of quantization schemes for bandlimited signals. It demon- 
strates that to describe a bandlimited signal one can use discrete-time processing to 
quantize its samples and one can then map the quantized samples to a bandlimited 
signal. The energy in the error signal corresponding to the difference between the 
original signal and its description is then proportional to the sum of the squared 
differences between the samples of the original signal and the quantized version. 


Theorem 8.6.1 (Bandlimited Signals and Square-Summable Sequences). Let 
T=1/(2W), where W > 0. 
(i) If u is an energy-limited signal that is bandlimited to W Hz, then the bi- 


infinite sequence 
..,u(—T), u(0), u(T), u(2T),... 


consisting of its samples taken at integer multiples of T is square summable 
and 


T S> |u(ét)|? = llull3 - 


L=—0o 
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(it) More generally, if u and v are energy-limited signals that are bandlimited 
to W Hz, then 


T S© u(éT) v* (ET) = (u,v). 


L=—0o 


(itt) If {ac} is a bi-infinite square-summable sequence, then there exists an energy- 
limited signal u that is bandlimited to W Hz such that its samples are given 
by 

u(€T) =a, EZ. 


(iv) The mapping that maps every energy-limited signal that is bandlimited to W 
Hz to the square-summable sequence consisting of its samples is linear. 


8.7 Prolate Spheroidal Wave Functions 


The following result, which is due to Slepian and Pollak, will not be used in this 
book; it is included for its sheer beauty. 


Theorem 8.7.1. Let the positive constants T > 0 and W > 0 be given. Then 
there exists a sequence of real functions $1, @2,... and a corresponding sequence 
of positive numbers 1 > Ag >--- such that: 


(i) The sequence $1, b2,... forms a CONS for the space of energy-limited signals 
that are bandlimited to W Hz, so, a fortiori, 


/ de(t) dp (thdt=Hl= 0}, CU EN. (8.50a) 
(ii) The sequence of scaled and time-windowed functions $1,w,2,w,-.. defined at 
every t € R by 
DS (1{It| < a ceN (8.50b) 
L,w ra Xp 1] = 9 ’ 


forms a CONS for the subspace of Le consisting of all energy-limited signals 
that vanish outside the interval [—T/2, 1/2], so, a fortiori, 


T/2 
i; ge(t) dp (t) dt =AT{l= 0}, CU EN. (8.50c) 
-1/2 


(itt) For everyt ER, 


T/2 
/ UPFywtt=2) dryer = p68, PEN. (8.504) 

—T/2 
The above functions @1, ¢2,... are related to Prolate Spheroidal Wave Functions. 


For a discussion of this connection, a proof of this theorem, and numerous appli- 
cations see (Slepian and Pollak, 1961) and (Slepian, 1976). 
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8.8 Exercises 


Exercise 8.1 (Expansion of a Function). Expand the function t + sinc?(t/2) as an or- 
thonormal expansion in the functions 


..,t+ sinc(t + 2),t +> sinc(t + 1), t+ sinc(t), t+ sinc(t — 1),t + sinc(t — 2),... 


Exercise 8.2 (Inner Product with a Bandlimited Signal). Show that if x is an energy- 
limited signal that is bandlimited to W Hz, and if y € Le, then 


co 


(x, y) = Ts ey «(lTs) yipr(Ts), 


L=—0o 


where yipr is the result of passing y through an ideal unit-gain lowpass filter of bandwidth 
W Hz, and where T; = 1/(2W). 


Exercise 8.3 (Approximating a Sinc by Sincs). Find the coefficients {a} that minimize 
the integral 


dhe (sine(3¢/2) - oe ay sinc(t — 0) dt. 


What is the value of this integral when the coefficients are chosen as you suggest? 


Exercise 8.4 (Integrability and Summability). Show that if x is an integrable signal that 
is bandlimited to W Hz and if T; = 1/(2W), then 


S- |x(€Ts)| < 00. 


L=—o0o 


Hint: Leth be the IFT of the mapping in (7.15) when we substitute 0 for f.; 2W for W; 
and 2W+A for W., where A > 0. Express x(€Ts) as (x x h) (€Ts); upper-bound the 
convolution integral using Proposition 2.4.1; and use Fubini’s Theorem to swap the order 
of summation and integration. 


Exercise 8.5 (Approximating an Integral by a Sum). One often approximates an integral 


by a sum, e.g., 
/ a(t) dt +6 S- x(€6). 


ae L=—00 


(i) Show that if u is an energy-limited signal that is bandlimited to W Hz, then, for 
every 0 < 6 < 1/(2W), the above approximation is exact when we substitute |u(t)|? 


for x(t), that is, 
i Ju(t)/? dt =5 S~ |u(es)|?. 
=e L=—0o 
(ii) Show that if x is an integrable signal that is bandlimited to W Hz, then, for every 
0<6<1/(2W), 


co 


ie a(t)dt=5 S> x(f6). 


-. L=—0o 
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(iii) Consider the signal u: t + sinc(t). Compute eles using Parseval’s Theorem and 
use the result and Part (i) to show that 


Exercise 8.6 (On the Pointwise Sampling Theorem). 


(i) Let the functions g, go,gi,... be elements of £2 that are zero outside the interval 
[—W, W]. Show that if ||g — gn||, — 0, then for every t ER 


oo 


lim [| ga(f)e2"%* df = a. * gif) ett af. 


n—0co 
—oo 


(ii) Use Part (i) to prove the Pointwise Sampling Theorem for energy-limited signals. 


Exercise 8.7 (Reconstructing from a Finite Number of Samples). Show that there does 
not exist a universal positive integer L such that at t = T/2 


a(t) — > x(—€T) sine(= + 0 <0.1 


f=—L 


for all energy-limited signals x that are bandlimited to 1/(2T) Hz. 


Exercise 8.8 (Inner Product between Passband Signals). Let xpp and yps be energy- 
limited passband signals that are bandlimited to W Hz around the carrier frequency fe. 
Let xpp and ypp be their corresponding baseband representations. Let T= 1/W. Show 
that 


co 


(xpB, yp) = 2TRe( S* xpp(€T) vaa(lT)). 


£=—0o 


Exercise 8.9 (Closed Subspaces). Let 7/ denote the set of energy-limited signals that 
vanish outside some interval. Thus, u is in U/ if, and only if, there exist a,b € R (that may 
depend on u) such that u(t) is zero whenever t ¢ [a,b]. Show that U/ is a linear subspace 
of Le, but that it is not closed. 


Exercise 8.10 (Projection onto an Infinite- Dimensional Subspace). 


(i) Let U Cc Le be the set of all elements of £2 that are zero outside the interval 
[—1,+1]. Given v € Lo, let w be the signal w: t+ v(t) I{|t| < 1}. Show that w is 
in U and that v — w is orthogonal to every signal in U/. 


(ii) Let U/ be the subspace of energy-limited signals that are bandlimited to W Hz. 
Given v € Loe, define w = vx LPFw. Show that w is in YU and that v — w is 
orthogonal to every signal in U. 


Exercise 8.11 (A Maximization Problem). Of all unit-energy real signals that are band- 
limited to W Hz, which one has the largest value at t = 0? What is its value at t = 0? 
Repeat for ¢ = 17. 
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Y 


energy-limited signals that 
are bandlimited to W Hz 


y 


energy-limited functions that 
vanish outside the interval [—W, W) 


generic element of Y 


generic element of V 


x: tr a(t) gs: frg(f) 
a CONS a CONS 
..., 1, Bo, V1, - +, W-1,Y0,Y1,- 
ls 1. 
t) = V2Wsinc(2Wt + £ = el tIWIL_W < f <W 
inner product inner product 
e3 we) (g, we) 
ee) WwW 
1 
a(t)V2W sinc (2Wt + £) dt : ea 
i. neuer | aaa f 
=. Pag os st = Gap: . Coe . A 
— Tan ( a g’s ¢-th Fourier Series Coefficient (= ce) 
Sampling Theorem Fourier Series 
L L 
jim, x= ys (x, de) pel] =0, Jim = (g, We) wel] = 0, 
f=-L 2 f=-L 2 
i.e., ie., 
[|to- © «a “ao fin So ager] 
a(t) — ys «(—~—) sinc(2Wt + £)| dt +0 / g(f) - os ce etl) af 0) 
as Froa 2W _w aE 2W 


Table 8.1: The duality between the Sampling Theorem and the Fourier Series Representation. 


Chapter 9 


Sampling Real Passband Signals 


9.1 Introduction 


In this chapter we present a procedure for representing a real energy-limited pass- 
band signal that is bandlimited to W Hz around a carrier frequency f, using com- 
plex numbers that we accumulate at a rate of W complex numbers per second. 
Alternatively, since we can represent every complex number as a pair of real num- 
bers (its real and imaginary parts), we can view our procedure as allowing us to 
represent the signal using real numbers that we accumulate at a rate of 2W real 
numbers per second. Thus we propose to accumulate 


2W real samples per second, 


or 


W complex samples per second. 


Note that the carrier frequency f. plays no role here (provided, of course, that 
fe > W/2): the rate at which we accumulate real numbers to describe the passband 
signal does not depend on f,.! 


For real baseband signals this feat is easily accomplished using the Sampling The- 
orem as follows. A real energy-limited baseband signal that is bandlimited to W 
Hz can be reconstructed from its (real) samples that are taken 1/(2W) seconds 
apart (Theorem 8.4.3), so the signal can be reconstructed from real numbers (its 
samples) that are being accumulated at the rate of 2W real samples per second. 


For passband signals we cannot achieve this feat by invoking the Sampling Theorem 
directly. Even though, by Corollary 7.7.3, every energy-limited passband signal xpp 
that is bandlimited to W Hz around the center frequency f, is also an energy-limited 
bandlimited (baseband) signal, we are only guaranteed that xpg be bandlimited 


1But the carrier frequency f- does play a role in the reconstruction. 


161 


162 Sampling Real Passband Signals 


to fe + W/2 Hz. Consequently, if we were to apply the Sampling Theorem directly 
to xpp we would have to sample xpp every 1/(2f- + W) seconds, i.e., we would 
have to accumulate 2f, + W real numbers per second, which can be much higher 
than 2W, especially in wireless communications where f, >> W. 


Instead of applying the Sampling Theorem directly to xpg, the idea is to apply it to 
Xpp’s baseband representation xpp. Suppose that xpg is a real energy-limited pass- 
band signal that is bandlimited to W Hz around the carrier frequency f,. By Theo- 
rem 7.7.12 (vii), it can be represented using its baseband representation xpp, which 
is a complex baseband signal that is bandlimited to W/2 Hz (Theorem 7.7.12 (v)). 
Consequently, by the £2-Sampling Theorem (Theorem 8.4.3), xpp can be described 
by sampling it at a rate of W samples per second. Since the baseband signal is 
complex, its samples are also, in general, complex. Thus, in sampling xpp every 
1/W seconds we are accumulating one complex sample every 1/W seconds. Since 
we can recover Xpp from xpp and fo, it follows that, as we wanted, we have found 
a way to describe xpp using complex numbers that are accumulated at a rate of W 
complex numbers per second. 


9.2 Complex Sampling 


Recall from Section 7.7.3 (Theorem 7.7.12) that a real energy-limited passband 
signal xpp that is bandlimited to W Hz around a carrier frequency f, can be 
represented using its baseband representation xpp as 


tpp(t) = 2Re(e?"/ rpn(t)), teER, (9.1) 
where xpp is given by 
xpp = (te?! app (t)) * LPFw,, (9.2) 


and where the cutoff frequency W, can be chosen arbitrarily in the range 


WwW WwW 
—<W.<2f,-—. 9.3 
DS We S$ 2fe- 5 (9.3) 
The signal xpp is an energy-limited complex baseband signal that is bandlimited 
to W/2 Hz. Being bandlimited to W/2 Hz, it follows from the £2-Sampling The- 
orem that xpp can be reconstructed from its samples taken 1/(2(W/2)) = 1/W 
seconds apart. We denote these samples by 


osn(y0); LEZ (9.4) 
so, by (9.2), 
nps(w) z (( 1+ er fet ron (t)) +LPFw,) ear 0eZ. (9.5) 


These samples are, in general, complex. Their real part corresponds to the samples 
of the in-phase component Re(xgp), which, by (7.41a), is given by 


Re(xpp) = (t + xpp(t) cos(2m fet) * LPFw,. (9.6) 
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-9 mpp(t)eos(2nfet) [ a | Re(ena(6) J Belesn(€/W)) 
£/W 
@ cos(27 fet) 


tpp(t) — Wew. < 2f.- ¥ 
90° 
_ ara sinnhet) Tony, Im(ren(t)) \_Im(ena(t/W)) 
£/W 


Figure 9.1: Sampling of a real passband signal xpz. 


(for W, satisfying (9.3)) and their imaginary part corresponds to the samples of 
the quadrature-component Im(xgp), which, by (7.41b), is given by 


Im(xpp) = —(t > wpp(t) sin(27 fot) * LPFw, . (9.7) 


Thus, 


von (gp) = ((t sro(t)eost2nfe)) «LP W,) (Z) 


= i((¢ 1+ xpp(t) sin(2z fet) *LPFw. ), eeZ. (98) 


The procedure of taking a real passband signal xpp and sampling its baseband 
representation to obtain the samples (9.8) is called complex sampling. It is 
depicted in Figure 9.1. The passband signal xpg is first separately multiplied 
by t } cos(27f.t) and by t + —sin(27f.t), which are generated using a local 
oscillator and a 90°-phase shifter. Each result is fed to a lowpass filter with cutoff 
frequency W, to produce the in-phase and quadrature component respectively. 
Each component is then sampled at a rate of W real samples per second. 


9.3 Reconstructing xpp from its Complex Samples 


By the Pointwise Sampling Theorem (Theorem 8.4.5) applied to the energy-limited 
signal xpp (which is bandlimited to W/2 Hz) we obtain 


Co 


zpp(t)= >> ne(a) sinc(Wt— 0), teER. (9.9) 


L=—0o 
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Consequently, by (9.1), xpp can be reconstructed from its complex samples as 


: ea L 
xpp(t) = 2Re (ee S- LBB (=) sinc(Wét — 0). teER. (9.10a) 
L=—00 
Since the sinc (-) function is real, this can also be written as 
zpp(t) = 2 S- Re Ge LBB (=)) sinc(Wt— 2), teER, (9.10b) 
lL=—0o 


or, using real operations, as 
co 


wpp(t) = 2 S- Re (sn (a) sinc(Wt — £) cos(27 f-t) 


L=—0o 


= é 
—2 Im (: — ) sinc(Wé — @)sin(27 ft), tER.  (9.10c) 
ae me (w) 


As we next show, we can obtain another form of convergence using the £2-Sampling 
Theorem (Theorem 8.4.3). We first note that by that theorem 


a 0. (9.11) 


L-o0o 2 


L 
e 
lim |: Ee tpp(t) om y ZBB\| wa, sinc(Wt = e) 
Xl) 


We next note that xpp is the baseband representation of xpgp and that—as can be 
verified directly or by using Proposition 7.7.9—the mapping 


tr xpp(l/W) sinc(Wt — 2) 


is the baseband representation of the real passband signal 
tro 2Re Cae LBB (<) sinc(Wét — 0) : 


Consequently, by linearity (Theorem 7.7.12 (ii)), the mapping 


L 
e 
tr xpp(t) — Xtppl — ) sinc(Weé — 0) 
BB a pa (yy) 


is the baseband representation of the real passband signal 


L 
t++ zpp(t) — 2Re Ga S- LBB (<) sinc(Wt — 0) 


é=-L 
and hence, by Theorem 7.7.12 (iii), 


2 


L 
|: ++ xpp(t) — 2Re pee x on) sinc(Wt — 0) 


f£=-L 


2 
- U 
= ae be vpp(t) _ S- ee) sinc(Wt _ e) 


f=-L 


; (9.12) 


2 
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Combining (9.11) with (9.12) yields the £2 convergence 


lim 
L-oo 


L 
tt> rpp(t) —2Re eae y roa (7) sinc(W#t — 0) | =0. (9.138) 


=—-L 
We summarize how a passband signal can be reconstructed from the samples of its 
baseband representation in the following theorem. 


Theorem 9.3.1 (The Sampling Theorem for Passband Signals). Let xpp be a 
real energy-limited passband signal that is bandlimited to W Hz around the carrier 
frequency f.. For every integer £, let xgp(£/W) denote the time-€/W sample of the 
baseband representation xpp of Xpp; see (9.5) and (9.8). 


(i) xpp can be pointwise reconstructed from the samples using the relation 


xpp(t) = 2Re Gas S- x(a) sinc(Wét — 0). teR. 


(ti) xpp can also be reconstructed from the samples in the Le sense 


[oe) 


L 2 
lim (+09 —2Re Gas S~ xpp (=) sinc(Wt — 0) =O: 


L 
TCO J _ a0 l=_L 


(itt) The energy in xpp can be reconstructed from the sum of the squared magni- 
tudes of the samples via 


ett = 2, S fl 


(iv) If yppg is another real energy-limited passband signal that is bandlimited to 
W Hz around f., and if {ypp(é/W)} are the samples of its baseband repre- 
sentation, then 


(Xpp, YPB) = wRe( ss van (Ww) vis (w)): 
e 


=—0co 


Proof. Part (i) is just a restatement of (9.10b). Part (ii) is a restatement of (9.13). 
Part (iii) is a special case of Part (iv) corresponding to ypp being equal to xpz. It 
thus only remains to prove Part (iv). This is done by noting that if xpp and yep 
are the baseband representations of xpp and ypp, then, by Theorem 7.7.12 (iv), 


(Xpp, YPB) = 2Re((xsp, Ypp)) 


= ae( ‘ opn() vie (@)): 


L=—0o 


where the second equality follows from Theorem 8.4.3 (iii). 
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Using the isomorphism between the family of complex square-summable sequences 
and the family of energy-limited signals that are bandlimited to W Hz (Theo- 
rem 8.6.1), and using the relationship between real energy-limited passband signals 
and their baseband representation (Theorem 7.7.12), we can readily establish the 
following isomorphism between the family of complex square-summable sequences 
and the family of real energy-limited passband signals. 


Theorem 9.3.2 (Real Passband Signals and Square-Summable Sequences). Let 
fe, W, and T be constants satisfying 


fo>W/2>0, T=1/W. 


(i) If xpp is a real energy-limited passband signal that is bandlimited to W Hz 
around f,, and if Xpp is its baseband representation, then the bi-infinite se- 
quence consisting of the samples of Xpp at integer multiples of T 


.-.,%pB(—T), 2BB(0), Bp(T), BB (2T),... 
is a square-summable sequence of compler numbers and 


2T S~ |xpn(|” = lxpsll3 - 


L=—0o 


(it) More generally, if xpp and ypp are real energy-limited passband signals that 
are bandlimited to W Hz around the carrier frequency f., and if xpp and 
Ypp are their baseband representations, then 


2TRe( x tpp(ET) viss(M) = (Xpp, Ypp) - 
lL=—0o 
(iti) If ...,@-1,Q0,01,... 18 @ square-summable bi-infinite sequence of complex 


numbers, then there exists a real energy-limited passband signal xpp that is 
bandlimited to W Hz around the carrier frequency f. such that the samples 
of its baseband representation xpp are given by 


rpp(ll) =ae, LEZ. 


(iv) The mapping of every real energy-limited passband signal that is bandlimited 
to W Hz around f. to the square-summable sequence consisting of the samples 
of its baseband representation is linear (over R). 


9.4 Exercises 


Exercise 9.1 (A Specific Signal). Let x be a real energy-limited passband signal that 
is bandlimited to W Hz around the carrier frequency f.. Suppose that all its complex 
samples are zero except for its zero-th complex sample, which is given by 1 +i. What 
is x? 
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Exercise 9.2 (Real Passband Signals whose Complex Samples Are Real). Characterize 
the Fourier Transforms of real energy-limited passband signals that are bandlimited to W 
Hz around the carrier frequency f. and whose complex samples are real. 


Exercise 9.3 (Multiplying by a Carrier). Let x be a real energy-limited signal that is 
bandlimited to W/2 Hz, and let f- be larger than W/2. Express the complex samples of 
tr a(t) cos(27f-t) in terms of x. Repeat for t+ x(t) sin(27 fet). 


Exercise 9.4 (Naively Sampling a Passband Signal). 


(i) Consider the signal x: t > m(t)sin(27fct), where m/(-) is an integrable signal that 
is bandlimited to 100 Hz and where f, = 100 MHz. Can x be recovered from its 
samples ...,2(—T), x(0),x(T),... when 1/T = 100 MHz? 

(ii) Consider now the general case where x is an integrable real passband signal that is 
bandlimited to W Hz around the carrier frequency f.. Find conditions guaranteeing 
that x be reconstructible from its samples ...,x2(—T),x(0), x(T),... 


Exercise 9.5 (Orthogonal Passband Signals). Let xpp and ypg be real energy-limited 
passband signals that are bandlimited to W Hz around the carrier frequency f.. Under 
what conditions on their complex samples are they orthogonal? 


Exercise 9.6 (Sampling a Baseband Signal As Though It Were a Passband Signal). Recall 
that, ignoring some technicalities, a real baseband signal x of bandwidth W Hz can be 
viewed as a real passband signal of bandwidth W around the carrier frequency f., where 
fc = W/2 (Problem 7.3). Compare the reconstruction formula for x from its samples to 
the reconstruction formula for x from its complex samples. 


Exercise 9.7 (Multiplying the Complex Samples). Let x be a real energy-limited passband 
signal that is bandlimited to W Hz around the carrier frequency f.. Let ...,%-1,%0,%1,... 
denote its complex samples taken 1/W second apart. Let y be a real energy-limited 
passband signal that is bandlimited to W Hz around the carrier frequency f. and whose 
complex samples are like those of x but multiplied by i. Relate the FT of y to the FT 
of x. 


Exercise 9.8 (Delayed Complex Sampling). Let x and y be real energy-limited passband 
signals that are bandlimited to W Hz around the carrier frequency f-. Suppose that the 
complex samples of y are the same as those of x, but delayed by one: 


vss (57,) =2e(—-), LEZ. 


How are x and y related? Is y a delayed version of x? 


Exercise 9.9 (On the Family of Real Passband Signals). Is the set of all real energy- 
limited passband signals that are bandlimited to W Hz around the carrier frequency fc 
a linear subspace of the set of all complex energy-limited signals? 


Exercise 9.10 (Complex Sampling and Inner Products). Show that the ¢-th complex 
sample zpp(l/W) of any real energy-limited passband signal that is bandlimited to W 
Hz around the carrier frequency f. can be expressed as an inner product 


ree (a) =(x,@e), EZ, 


where ..., 6-1, ho, gi,... are orthogonal equi-energy complex signals. Is @z in general 
a delayed version of do? 


168 Sampling Real Passband Signals 


Exercise 9.11 (Absolute Summability of the Complex Samples). Show that the complex 
samples of a real integrable passband signal that is bandlimited to W Hz around the 
carrier frequency f. must be absolutely summable. 


Hint: See Exercise 8.4. 


Exercise 9.12 (The Convolution Revisited). Let x and y be real integrable passband 
signals that are bandlimited to W Hz around the carrier frequency f-. Express the 
complex samples of x x y in terms of those of x and y. 


Exercise 9.13 (Complex Sampling and Filtering). Let x be a real integrable passband 
signal that is bandlimited to W Hz around the carrier frequency f., and let h be the 
impulse response of a real stable filter. Relate the complex samples of x xh to those of x 
and hx BPFw, s,.. 


Chapter 10 


Mapping Bits to Waveforms 


10.1 What Is Modulation? 


Data bits are mathematical entities that have no physical attributes. To send them 
over a channel, one needs to first map them into some physical signal, which is 
then “fed” into a channel to produce a physical signal at the channel’s output. For 
example, when we send data over a telephone line, the data bits are first converted 
to an electrical signal, which then influences the voltage measured at the other 
end of the line. (We use the term “influences” because the signal measured at the 
other end of the line is usually not identical to the channel input: it is typically 
attenuated and also corrupted by thermal noise and other distortions introduced 
by various conversions in the telephone exchange system.) Similarly, in a wireless 
system, the data bits are mapped to an electromagnetic wave that then influences 
the electromagnetic field measured at the receiver antenna. In magnetic recording, 
data bits are written onto a magnetic medium by a mapping that maps them to 
a magnetization pattern, which is then measured (with some distortion and some 
noise) by the magnetic head at some later time when the data are read. 


In the first example the bits are mapped to continuous-time waveforms correspond- 
ing to the voltage across an impedance, whereas in the last example the bits are 
mapped to a spatial waveform corresponding to different magnetizations at dif- 
ferent locations across the magnetic medium. While some of the theory we shall 
develop holds for both cases, we shall focus here mainly on channels of the former 
type, where the channel input signal is some function of time rather than space. 


We shall further focus on cases where the channel input corresponds to a time- 
varying voltage across a resistor, a time-varying current through a resistor, or a 
time-varying electric field, so the energy required to transmit the signal is propor- 
tional to the time integral of its square. Thus, if x(t) denotes the channel input at 
time t, then we shall refer to ee x?(r) dr as the transmitted energy during the 
time interval beginning at time t and ending at time t+ A. 


There are many mappings of bits to waveforms, and our goal is to find “good” ones. 
We will, of course, have to define some figures of merit to compare the quality of 
different mappings. We shall refer to the mapping of bits to a physical waveform 
as modulation and to the part of the system that performs the modulation as the 
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modulator. 


Without going into too much detail, we can list a few qualitative requirements of a 
modulator. The modulation should be robust with respect to channel impairments, 
so that the receiver at the other end of the channel can reliably decode the data bits 
from the channel output. Also, the modulator should have reasonable complexity. 
Finally, in many applications we require that the transmitted signal be of limited 
power so as to preserve the battery. In wireless applications the transmitted signal 
may also be subject to spectral restrictions so as to not interfere with other systems. 


10.2 Modulating One Bit 


One does not typically expect to design a communication system in order to convey 
only one data bit. The purpose of the modulator is typically to map an entire bit 
stream to a waveform that extends over the entire life of the communication system. 
Nevertheless, for pedagogic reasons, it is good to first consider the simplest scenario 
of modulating a single bit. In this case the modulator is fully characterized by two 
functions xo(-) and 21(-) with the understanding that if the data bit D is equal 
to zero, then the modulator produces the waveform zo(-) and that otherwise it 
produces x1(-). Thus, the signal produced by the modulator is given by 


X(t) = ee eu. teR. (10.1) 


For example, we could choose 


xo(t) = teER, 


Ae! aft/T > 0, 
0 otherwise, ’ 


and 


9 


A if0<t/T<1l 
eg EN ee gee 
O otherwise, 


where T= 1 sec and where A is a constant such that A® has units of power. 


This may seem like an odd way of writing these waveforms, but we have our 
reasons: we typically think of t as having units of time, and we try to avoid 
applying transcendental functions (such as the exponential function) to quantities 
with units. Also, we think of the squared transmitted waveform as having units 
of power, whereas we think of the transcendental functions as returning unit-less 
arguments. Hence the introduction of the constant A with the understanding that 
A? has units of power. 


We denoted the bit to be sent by an uppercase letter (D) because we like to de- 
note random quantities (such as random variables, random vectors, and stochastic 
processes) by uppercase letters, and we think of the transmitted bit as a random 
quantity. Indeed, if the transmitted bit were deterministic, there would be no 
need to transmit it! This may seem like a statement made in jest, but it is ac- 
tually very important. In the first half of the twentieth century, engineers often 


10.3 From Bits to Real Numbers 171 


analyzed the performance of (analog) communication systems by analyzing their 
performance in transmitting some particular signal, e.g., a sine wave. Nobody, of 
course, transmitted such “boring” signals, because those could always be produced 
at the receiver using a local oscillator. In the second half of the twentieth century, 
especially following the work of Claude Shannon, engineers realized that it is only 
meaningful to view the data to be transmitted as random, i.e., as quantities that 
are unknown at the receiver and also unknown to the system designer prior to the 
system’s deployment. We thus view the bit to be sent D as a random variable. 
Often we will assume that it takes on the values 0 and 1 equiprobably. This is a 
good assumption if prior to transmission a data compression algorithm is used. 


By the same token, we view the transmitted signal as a random quantity, and 
hence the uppercase X. In fact, if we employ the above signaling scheme, then at 
every time instant t’ € R the value X(t’) of the transmitted waveform is a random 
variable. For example, at time T/2 the value of the transmitted waveform is X(T/2), 
which is a random variable that takes on the values Ae~!/? and A equiprobably. 
Similarly, at time 2T the value of the transmitted waveform is X(2T), which is a 
random variable taking on the values e~? and 0 equiprobably. Mathematicians call 
such a waveform a random process or a stochastic process (SP). This will be 
defined formally in Section 12.2. 


It is useful to think about a random process as a function of two arguments: time 
and “luck” or, more precisely, as a function of time and the result of all the random 
experiments in the system. For a fixed instant of time t € R, we have that X(t) 
is a random variable, i.e., a real-valued function of the randomness in the system 
(in this case the realization of D). Alternatively, for a fixed realization of the 
randomness in the system, the random process is a deterministic function of time. 
These two views will be used interchangeably in this book. 


10.3 From Bits to Real Numbers 


Many of the popular modulation schemes can be viewed as operating in two stages. 
In the first stage the data bits are mapped to real numbers, and in the second stage 
the real numbers are mapped to a continuous-time waveform. If we denote by k the 
number of data bits that will be transmitted by the system during its lifetime (or 
from the moment it is turned on until it is turned off), and if we denote the data 
bits by D,, Do,..., Dz, then the first stage can be described as the application of 
a mapping ¢(-) that maps length-k sequences of bits to length-n sequences of real 
numbers: 


yp: {0,1}* > R” 
(di,...,dk) > (@1,.--,2n). 


From an engineering point of view, it makes little sense to allow for the encoding 
function to map two different binary k-tuples to the same real n-tuple, because 
this would result in the transmitted waveforms corresponding to the two k-tuples 
being identical. This may cause errors even in the absence of noise. We shall 
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therefore assume throughout that the mapping ¢(-) is one-to-one (injective) so 
no two distinct data k-tuples are mapped to the same n-tuple of real numbers. 


An example of a mapping that maps bits to real numbers is the mapping that maps 
each data bit D; to the real number X,; according to the rule 


1 if D,;=0 
pe RR ee = (10.2) 
aap Yee a, 


In this example one real symbol X, is produced for every data bit, son = k. For 
this reason we say that this mapping has the rate of one bit per real symbol. 


As another example consider the case where k is even and the data bits {D;} are 


broken into pairs 
(D,, D2), (D3, D4), . ates (Dy—1, De) 


and each pair of data bits is then mapped to a single real number according to the 
rule 


+3 if Doy_1 = Do; = 0, 
+1 if Das; =0 and Dp; =1, 
—3 if Doj1 = Do; = 1, 
—1 if Dg; =1 and Do; =0, 


(D2;-1, Daj) 


In this case n = k/2, and we say that the mapping has the rate of two bits per real 
symbol. 


Note that the rate of the mapping could also be a fraction. Indeed, if each data 
bit D; produces two real numbers according to the repetition law 
1,41) if D; =0 
Dees Cpt) hs as ee (10.4) 
(-1,-1) if D; =1, 


then n = 2k, and we say that the mapping is of rate half a bit per real symbol. 


Since there is a natural correspondence between R? and C, i.e., between pairs of real 
numbers and complex numbers (where a pair of real numbers (x,y) corresponds 
to the complex number x + iy), the rate of the above mapping (10.4) can also be 
stated as one bit per complex symbol. This may seem like an odd way of stating the 
rate, but it has some advantages that will become apparent later when we discuss 
the mapping of real (or complex) numbers to waveforms and the Nyquist Criterion. 


10.4 Block-Mode Mapping of Bits to Real Numbers 


The examples we gave in Section 10.3 of mappings vy: {0,1}* — R” have something 
in common. In each of those examples the mapping can be described as follows: the 
data bits D,,..., Dx are first grouped into binary K-tuples; each K-tuple is then 
mapped to a real N-tuple by applying some mapping enc: {0,1}* — RN; and the 
so-produced real N-tuples are then concatenated to form the sequence Xj,...,Xn, 
where n = (k/K)N. 
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D,,D2, +++ ,Dx, Dx4i, ++: , Dox, »Dp-K4i, -** De 
al jan | enc(-) 
X1, Xe, ok XN, XN41, ar, , Xan, ,Xn-N41; Ries: Xn 
enc(Di,..., Dx) enc(Dx41,..., Dek) enc(Dr—K+1,---; Dk) 


Figure 10.1: Block-mode encoding. 


In the first example K = N = 1 and the mapping of K-tuples to N-tuples is the 
mapping (10.2). In the second example K = 2 and N = 1 with the mapping (10.3). 
And in the third example K = 1 and N = 2 with the repetition mapping (10.4). 


To describe such mappings y: {0,1}* — R” more formally we need the notion of 
a binary-to-reals block encoder, which we define next. 


Definition 10.4.1 ((K, N) Binary-to-Reals Block Encoder). A (K,N) binary-to- 
reals block encoder is a one-to-one mapping from the set of binary K-tuples to 
the set of real N-tuples, where K and N are positive integers. The rate of a (K,N) 
binary-to-reals block encoder is defined as 


K bit 

N [real symbol | © 
Note that we shall sometimes omit the phrase “binary-to-reals” and refer to such 
an encoder as a (K,N) block encoder. Also note that “one-to-one” means that 
no two distinct binary K-tuples may be mapped to the same real N-tuple. 


We say that an encoder y: {0,1}* — R” operates in block-mode using the 
(K,N) binary-to-reals block encoder enc(-) if 


1) k is divisible by K; 
2) nis given by (k/K) N; and 


3) y(-) maps the binary sequence D,...,D, to the sequence X1,...,Xn by 
parsing the sequence D,,..., Dz into consecutive length-K binary tuples and 
by then concatenating the results of applying enc(-) to each such K-tuple as 
in Figure 10.1. 


If & is not divisible by K, we often introduce zero padding. In this case we 
choose k’ to be the smallest integer that is no smaller than k and that is divisible 
by K, ie., 


(where for every € € R we use [€] to denote the smallest integer that is no smaller 
than €, e.g., [1.24] = 2) and map D;,..., Dz to the sequence Xj,...,X, where 
k/ 


=—N 
K 


/ 
nr 
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D,, Do, Pet) Dx, Dwi, ey » Dex, » Dr —K41,--+,Dr,0,.-.,0 
pe pene) [eenee) 
eee 
X1, Xa, a »XN,XNG1, °° , Xen, Xn/-N41, 00 Xn! 
enc(Dj,..., Dx) enc(Dx+41,..-,; Dex) enc(Dr—K+1,---;Dx,0,...,0) 


Figure 10.2: Block-mode encoding with zero padding. 


by applying the (K, N) encoder in block-mode to the k’-length zero-padded binary 
tuple 
Dir. tee Vast (10.5) 
Sa 


k’ — k zeros 


as in Figure 10.2. 


10.5 From Real Numbers to Waveforms with Linear Modulation 


There are numerous ways to map a sequence of real numbers Xj,..., Xp, to a real- 
valued signal. Here we shall focus on mappings that have a linear structure. This 
additional structure simplifies the implementation of the modulator and demodu- 
lator. It will be described next. 


Suppose we wish to modulate the k data bits D,,...,Dx, and suppose that we 
have mapped these bits to the n real numbers Xj,...,X,. Here n can be smaller, 
equal, or greater than k. The transmitted waveform X(-) in a linear modulation 
scheme is then given by 


X(t) =A) XeH(t), teR, (10.6) 
é=1 
where the deterministic real waveforms g1,...,8, are specified in advance, and 


where A > 0 is a scaling factor. The waveform X(-) can be thus viewed as a scaled- 
by-A linear combination of the tuple (g1, mats ,Sn) with the coefficients X1,...,Xn: 


K=AY> Xve. (10.7) 
l=1 


The transmitted energy is a random variable that is given by 
2 lo) 
IXIZ= f x*(at 


= i (AD xa) 


—oco 
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—A? - ye NX iy : ge(t) Jer(t) dt 


€S1 S171. 


= AP S- S- XeXe (Se, Se) - 


f= 1-0 S11, 


The transmitted energy takes on a particularly simple form if the waveforms gp(-) 
are orthonormal, i.e., if 


(g.ge)=Hl=0}, £f €{1,...,n}, (10.8) 


in which case the energy is given by 


XZ =A’ S°X?, {ge} orthonormal. (10.9) 
f=1 


As an exercise, the reader is encouraged to verify that there is no loss in generality 
in assuming that the waveforms {ge} are orthonormal. More precisely: 


Theorem 10.5.1. Suppose that the waveform X(-) is generated from the binary 
k-tuple D,,..., Dx by applying the mapping y: {0,1}* > R” and by then linearly 
modulating the resulting n-tuple p(D1,..., Dx) using the waveforms {ge}?_, as in 
(10.6). 


Then there exist an integer 1 <n! < n; a mapping yg’: {0,1}* — R™; and n! 
orthonormal signals {@e}?_, such that if X'(-) is generated from Dj,...,Dx by 
applying linear modulation to y'(Di,...,Dx) using the orthonormal waveforms 


{ar}; then X'(-) and X(-) are indistinguishable for every k-tuple D,,..., Dk. 


Proof. The proof of this theorem is left as an exercise. 


Motivated by this theorem, we shall focus on linear modulation with orthonormal 
functions. But please note that even if the transmitted waveform satisfies (10.8), 
the received waveform might not. For example, the channel might consist of a 
linear filter that could destroy the orthogonality. 


10.6 Recovering the Signal Coefficients with a Matched Filter 


Suppose now that the binary k-tuple (D1,...,D,) is mapped to the real n-tuple 
(X1,...,Xn) using the mapping 


vy: {0,1}* — R” (10.10) 


and that the n-tuple (X,,...,X,) is then mapped to the waveform 


X(t) aA ee) teER, (10.11) 
f=1 
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where ¢1,...,@n are orthonormal: 
(de, be) =l{l= 0}, Ce G {1,...,n}. (10.12) 
How can we recover the k-tuple D,,...,D, from X(-)? The decoder’s problem 


is, of course, harder, because the decoder usually does not have access to the 
transmitted waveform X(-) but only to the received waveform, which may be a 
noisy and distorted version of X(-). Nevertheless, it is instructive to consider the 
noiseless and distortionless problem first. 


If we are able to recover the real numbers {.X¢}7_, from the received signal X(-), 
and if the mapping vy: {0,1}* — R” is one-to-one (as we assume), then the data 
bits {D;}4_, can be reconstructed from X(-). Thus, the question is how to recover 
{X¢}%_, from X(-). But this is easy if the functions {@;}7_, are orthonormal, 
because in this case, by Proposition 4.6.4 (i), X¢ is given by the scaled inner 
product between X and gy: 


i 
Xe= 5 (Kobe), C= 1,....0. (10.13) 


Consequently, we can compute X¢ by feeding X to a matched filter for @e and 
scaling the time-0 output by 1/A (Section 5.8). To recover {X,¢}/_, we thus need n 
matched filters, one matched to each of the waveforms {de}. 


The implementation becomes much simpler if the functions {@¢} have an additional 
structure, namely, if they are all time shifts of some function ¢(-): 


de(t) = o(t — Ce), (Zc eee teR). (10.14) 


In this case it follows from Corollary 5.8.3 that we can compute all the inner 
products {(X, @¢)} using one matched filter of impulse response @ by feeding X 
to the filter and sampling its output at the appropriate times: 


x= 5 [Xa odr)ar 


1 CO 
Z xf xo $(r — €Ts) dr 


1 se " 
=4 [XM HE- nar 
= +(x+d)(M), @=1,...,n. (10.15) 


Figure 10.3 demonstrates how the symbols {X,¢} can be recovered from X(-) using 
a single matched filter if the pulses {de} satisfy (10.14). 


10.7 Pulse Amplitude Modulation 
Under Assumption (10.14), the transmitted signal X(-) in (10.11) is given by 


X(t) =AS° Xr d(t— fs), tER, (10.16) 
l=1 
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x() —-| 6 a - AX 


Figure 10.3: Recovering the symbols from the transmitted waveform using a 
matched filter when (10.14) is satisfied. 


which is a special case of Pulse Amplitude Modulation (PAM), which we 
describe next. 


In PAM, the data bits D,,...,D, are mapped to real numbers Xj,..., Xn, which 
are then mapped to the waveform 


X(t) =AS-Xrg(t- A), tER, (10.17) 
f=1 


for some scaling factor A > 0, some function g: R — R, and some constant T; > 0. 
The function g (always assumed Borel measurable) is called the pulse shape; the 
constant T, is called the baud period; and its reciprocal 1/T; is called the baud 
rate.! The units of Ts are seconds, and one often refers to the units of 1 /Ts as real 
symbols per second. PAM can thus be viewed as a special case of linear modulation 
(10.6) with ge being given for every ¢ € {1,...,n} by the mapping t +> g(t — £Ts). 
The signal (10.16) can be viewed as a PAM signal where the pulse shape @ satisfies 
the orthonormality condition (10.14). 


In this book we shall typically denote the PAM pulse shape by g. But we shall 
use @ if we assume an additional orthonormality condition such as (10.12). In this 
case we shall refer to 1/T; as having units of real dimensions per second: 


1 ] di si 

= = a ,  @ satisfies (10.12). (10.18) 
Note that according to Theorem 10.5.1 there is no loss in generality in assuming 
that the pulses {gz} are orthonormal. There is, however, a loss in generality in 
assuming that they satisfy (10.14). 


10.8 Constellations 


Recall that in PAM the data bits D,,...,D, are first mapped to the real n-tuple 
X1,...,Xn using a one-to-one mapping gy: {0,1}* — R”, and that these real 
numbers are then mapped to the waveform X(-) via (10.17). Since there are only 
2* different binary k-tuples, it follows that each symbol X;, can take on at most 
2* different values. The set of values that X, can take on may, in general, depend 
on ¢. The union of all these sets (over £ € {1,...,}) is called the constellation of 


1 These terms honor the French engineer J.M.E. Baudot (1845-1903) who invented a telegraph 
printing system. 
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the mapping (-). Denoting the constellation of y(-) by V, we thus have that a real 
number x is in 4 if, and only if, for some choice of the binary k-tuple (dj,..., dx) 
and for some ¢ € {1,...,n} the &th component of y((d1,...,dx)) is equal to x. 


For example, the constellation corresponding to the mapping (10.2) is the set 
{—1,+1}; the constellation corresponding to (10.3) is the set {—3,—-1,+1,+3}; 
and the constellation corresponding to (10.4) is the set {—1,+1}. In all these 
examples, the constellation can be viewed as a special case of the constellation 
with 2v symbols 


{—(2v —1),...,—5,—-3, -1, +1, +3,+5,...,+(2v —1)} (10.19) 
for some positive integer v. A less prevalent constellation is the constellation 


ey es bea ea A (10.20) 


The number of points in the constellation V is just # 7%, i.e., the number of 
elements (cardinality) of the set ¥. 


The minimum distance 6 of a constellation is the Euclidean distance between 
the closest distinct elements in the constellation: 


A 


= min |x—a2’|. (10.21) 
v0’ EX 
cto’ 


The scaling of the constellation is arbitrary because of the scaling factor A in the 
signal’s description. Thus, the signal A >, X¢ g(t — £1s), where X¢ takes value in 
the set {£1} is of constellation {—1,+1}, but it can also be expressed in the form 
A'S, X; g(t — £1), where A’ = 2A and X% takes value in the set {—1/2,+1/2}, 
ie., as a PAM signal of constellation {—1/2,+1/2}. 


Different authors choose to normalize the constellation in different ways. One 
common normalization is to express the elements of the constellation as multiples 
of the minimum distance. Thus, we would represent the constellation {—1,+1} as 


1 1 
—=6,+=6 
{ah Fp 
and the constellation {—3,—1,+1,+3} as 
3 1 1 3 
+-6,+-0>. 
{ gorge to? 5°} 


The normalized version of the constellation (10.19) is 


ee 5. 38. 1 
{# 5 sty £54) 5}. (10.22) 


The second moment of a constellation ¥ is defined as 


ae ya (10.23) 


cTEX 
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The second moment of the constellation in (10.22) is given by 


LEX n=1 
1 5? 
= (Me = 1) = 10.24 
5 (M?-1) 5, (10.24a) 
where 
M = 2v (10.24b) 


is the number of points in the constellation, and where (10.24a)—(10.24b) can be 
verified using the identity 


Vv 


S (20 1)? = Su(a? 1), v=1,2,... (10.25) 


n=1 


10.9 Design Considerations 


Designing a communication system employing PAM with a block encoder entails 
making choices. We need to choose the PAM parameters A, T;, and g, and we 
need to choose a (K,N) block encoder enc(-). These choices greatly influence the 
overall system characteristics such as the transmitted power, bandwidth, and the 
performance of the system in the presence of noise. To design a system well, we 
must understand the effect of the design choices on the overall system at three 
levels. At the first level we must understand which design parameters influence 
which overall system characteristics. At the second level we must understand 
how the design parameters influence the system. And at the third level we must 
understand how to choose the design parameters so as to optimize the system 
characteristics subject to the given constraints. 


In this book we focus on the first two levels. The third requires tools from Infor- 
mation Theory and from Coding Theory that are beyond the scope of this book. 
Here we offer a preview of the first level. We thus briefly and informally explain 
which design choices influence which overall system properties. 


To simplify the preview, we shall assume in this section that the time shifts of the 
pulse shape by integer multiples of the baud period are orthonormal. Consequently, 
we shall denote the pulse shape by @ and assume that (10.12) holds. We shall also 
assume that k and n tend to infinity as in the bi-infinite block mode discussed in 
Section 14.5.2. Roughly speaking this assumption is tantamount to the assumption 
that the system has been running since time —oo and that it will continue running 
until time +oo. 

Our discussion is extremely informal, and we apologize to the reader for discussing 
concepts that we have not yet defined. Readers who are aggravated by this practice 
may choose to skip this section; the issues will be revisited in Chapter 29 after 
everything has been defined and all the claims proved. 


The key observation we wish to highlight is that, to a great extent, 
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the choice of the block encoder enc(-) can be decoupled from the 
choice of the pulse shape. The bandwidth and power spectral 
density depend hardly at all on enc(-) and very much on the pulse 
shape, whereas the probability of error on the white Gaussian noise 
channel depends very much on enc(-) and not at all on the pulse 
shape @. 


This observation greatly simplifies the design problem because it means that, rather 
than optimizing over @ and enc(-) jointly, we can choose each of them separately. 


We next briefly discuss the different overall system characteristics and which design 
choices influence them. 


Data Rate: The data rate R, that the system supports is determined by the baud 
period T,; and by the rate K/N of the encoder. It is given by 


a — LK [bit 
ame Pe sec | 


Power: The transmitted power does not depend on the pulse shape @ (Theo- 
rem 14.5.2). It is determined by the amplitude A, the baud period T,, and by 
the block encoder enc(-). In fact, if the block encoder enc(-) is such that when it 
is fed the data bits it produces zero-mean symbols that are uniformly distributed 
over the constellation, then the transmitted power is determined by A, Ts, and the 
second moment of the constellation only. 


Power Spectral Density: If the block encoder enc(-) is such that when it is fed 
the data bits it produces zero-mean and uncorrelated symbols of equal variance, 
then the power spectral density is determined by A, T;, and @ only; it is unaffected 
by enc(-) (Section 15.4). 


Bandwidth: The bandwidth of the transmitted waveform is equal to the band- 
width of the pulse shape @ (Theorem 15.4.1). We will see in Chapter 11 that 
for the orthonormality (10.12) to hold, the bandwidth W of the pulse shape must 
satisfy 
1 
W>—. 
~ QTs 

In Chapter 11 we shall also see how to design @ so as to satisfy (10.12) and so as 
to have its bandwidth as close as we wish to 1/(2T;).? 


Probability of Error: It is a remarkable fact that the pulse shape @ does not affect 
the performance of the system on the additive white Gaussian noise channel. Per- 
formance is determined only by A, T;, and the block encoder enc(-) (Section 26.5.2). 


?Information-theoretic considerations suggest that this is a good approach. 
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The preceding discussion focused on PAM, but many of the results also hold for 
Quadrature Amplitude Modulation, which is discussed in Chapters 16, 18, and 28. 


10.10 Some Implementation Considerations 


It is instructive to consider some of the issues related to the generation of a PAM 
signal 


X(t) =A) Xe G(t- 1), tER. (10.26) 
f=1 


Here we focus on delay, causality, and digital implementation. 


10.10.1 Delay 


To illustrate the delay issue in PAM, suppose that the pulse shape g(-) is strictly 
positive. In this case we note that, irrespective of which epoch ¢’ € R we consider, 
the calculation of X(t’) requires knowledge of the entire n-tuple X1,..., Xn. Since 
the sequence X1,...,X, cannot typically be determined in its entirety unless the 
entire sequence D,,..., Dx is determined first, it follows that, when g(-) is strictly 
positive, the modulator cannot produce X(t’) before observing the entire data 
sequence D,,...,D,. And this is true for any t’ € R! Since in the back of our 
minds we think about D;,...,Dx as the data bits that will be sent during the 
entire life of the system or, at least, from the moment it is turned on until it is 
shut off, it is unrealistic to expect the modulator to observe the entire sequence 
Dj ,..., Dx before producing any input to the channel. 


The engineering solution to this problem is to find some positive integer L such 
that, for all practical purposes, g(t) is zero whenever |t| > LTs, ie., 


g(t) ¥0,  |t| > LT. (10.27) 


In this case we have that, irrespective of t’ € R, only 2L+ 1 terms (approximately) 
determine X(t’). Indeed, if « is an integer such that 


KT, <t' <(K+1)Ts, (10.28) 


then 
K+L 
X(t’)eA SS Koti), “Kes ¢ Ses 11a (10.29) 
f=max{1,«n—L} 


where the sum is assumed to be zero if «+L < 1. 


Thus, if (10.27) holds, then the approximate calculation of X(t’) can be performed 
without knowledge of the entire sequence Xj,...,X,, and the modulator can start 
producing the waveform X(-) as soon as it knows Xj,..., XL. 
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10.10.2 Causality 


The reader may object to the fact that, even if (10.27) holds, the signal X(-) may 
be nonzero at negative times. It might therefore seem as though the transmitter 
needs to transmit a signal before the system has been turned on and that, worse 
still, this signal depends on the data bits that will be fed to the system in the 
future when the system is turned on. But this is not really an issue. It all has 
to do with how we define the epoch t = 0, i.e., to what physical time instant 
does t = 0 correspond. We never said it corresponded to the instant when the 
system was turned on and, in fact, there is no reason to set the time origin at 
that time instant or at the “Big Bang.” For example, we can set the time origin 
at LT; seconds-past-system-turn-on, and the problem disappears. Similarly, if the 
transmitted waveform depends on Xj,...,Xt, and if these real numbers can only 
be computed once the data bits D,,...,D, have been fed to the encoder, then it 
would make sense to set the time origin to the moment at which the last of these « 
data bits has been fed to the encoder. 


Some problems in Digital Communications that appear like tough causality prob- 
lems end up being easily solved by time delays and the redefinition of the time 
origin. Others can be much harder. It is sometimes difficult for the novice to de- 
termine which causality problem is of the former type and which of the latter. As 
a rule of thumb, you should be extra cautious when the system contains feedback 
loops. 


10.10.3 Digital Implementation 


Even when all the symbols among X1,...,X, that are relevant for the calculation 
of X(t’) are known, the actual computation may be tricky, particularly if the 
formula describing the pulse shape is difficult to implement in hardware. In such 
cases one may opt for a digital implementation using look-up tables. The idea is 
to compute only samples of X(-) and to then interpolate using a digital-to-analog 
(D/A) converter and an anti-aliasing filter. The samples must be computed at a 
rate determined by the Sampling Theorem, i.e., at least once every 1/(2W) seconds, 
where W is the bandwidth of the pulse shape. 


The computation of the values of X(-) at its samples can be done by choosing L 
sufficiently large so that (10.27) holds and by then approximating the sum (10.26) 
for t’ satisfying (10.28) by the sum (10.29). The samples of this latter sum can be 
computed with a digital computer or—as is more common if the symbols take on a 
finite (and small) number of values—using a pre-programmed look-up table. The 
size of the look-up table thus depends on two parameters: the number of samples 
one needs to compute every T, seconds (determined via the bandwidth of g(-) and 
the Sampling Theorem), and the number of addresses needed (as determined by L 
and by the constellation size). 
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10.11 Exercises 


Exercise 10.1 (Exploiting Orthogonality). Let the energy-limited real signals @1 and 2 
be orthogonal, and let A and A®) be positive constants. Let the waveform X be given 
by 

X= (AMX 4 ADK’ gy 4 (Am f. ADK) bo, 
where X“ and X) are unknown real numbers. How can you recover X® and x?) 
from X? 


Exercise 10.2 (More Orthogonality). Extend Exercise 10.1 to the case where #1,...@n 
are orthonormal; 


x= (arama he pat aA xkO Ve Spe 


EA (com AOx rave ® al AMX ) dy: 


and where the real numbers a”) for u,v € {1,..., 7} satisfy the orthogonality condition 


FS bin) : ife=e', Pe {i } 
y QS ONE . Lye Ree 1 
= 0 ifesd, 


Exercise 10.3 (A Constellation and its Second Moment). What is the constellation cor- 
responding to the (1,3) binary-to-reals block encoder that maps 0 to (+1,+2,+2) and 
maps 1 to (—1,—2,—2)? What is its second moment? Let the real symbols (Xe, LE Z) 
be generated from IID random bits (Dj, jE Z) in block mode using this block encoder. 
Compute 


L 
‘ 1 2 
Parad 2. el: 
f=-L 
Exercise 10.4 (Orthonormal Signal Representation). Prove Theorem 10.5.1. 


Hint: Recall the Gram-Schmidt procedure. 


Exercise 10.5 (Unbounded PAM Signal). Consider the formal expression 


X(t) = ss Xzsine(= - ¢), teR. 


L=—0o 


(i) Show that even if the X¢’s can only take on the values +1, the value of X(T;/2) 
can be arbitrarily high. That is, find a sequence {xe} such that xe € {+1, —1} 
for every £ € Z and 


(ii) Suppose now that g: R — R satisfies 


B 
|9(t)| s 1+ |t/T.jte’ R 
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for some a, 3 > 0. Show that if for some y > 0 we have |axe| < ¥ for all € € Z, then 


the sum 
co 


S> xe g(t — £Ts) 
L=—oco 


converges at every t and is a bounded function of t. 


Exercise 10.6 (Etymology). Let g be an integrable real signal. Express the frequency 
response of the matched filter for g in terms of the FT of g. Repeat when g is a complex 
signal. Can you guess the origin of the term “Matched Filter”? 


Hint: Recall the notion of a “matched impedance.” 


Exercise 10.7 (Recovering the Symbols from a Filtered PAM Signal). Let X(-) be the 
PAM signal (10.17), where A > 0, and where g(t) is zero for |t| > T;/2 and positive for 
|t] < Ts/2. 


(i) Suppose that X(-) is fed to a filter of impulse response h: t + I{|t| < Ts/2}. Is 
it true that for every @ € {1,...,n} one can recover X¢ from the filter’s output at 
time (Ts? If so, how? 

(ii) Suppose now that the filter’s impulse response is h: t + {—T;/2 < t < 3T;/4}. 
Can one always receover X, from the filter’s output at time @T;? Can one recover 
the sequence (X1,...,Xn) from the n samples of the filter’s output at the times 
Ts 2205 Wg? 


Exercise 10.8 (Continuous Phase Modulation). In Continuous Phase Modulation (CPM) 
the symbols (Xe) are mapped to the waveform 


X(t) = A cos (2m fet + 2mh > Xe g(t = Ts), teR, 


L=—oco 


where f.,h > 0 are constants and q is a mapping from R to R. Is CPM a special case of 
linear modulation? 


Chapter 11 


Nyquist’s Criterion 


11.1 Introduction 


In Section 10.7 we discussed the benefit of choosing the pulse shape @ in Pulse 
Amplitude Modulation so that its time shifts by integer multiples of the baud 
period T; be orthonormal. We saw that if the real transmitted signal is given by 


X(t)=AS > X/ d(t-41,), teER, 
f=1 


where for all integers @, @’ € {1,...,n} 
$(t — fT.) 6(t — Ts) dt =e = 2}, 


then 1 pe 
xe=q f X(t) d(t — €T;) dt, €=1,...,n, 


and all the inner products 
/ X(t) d(t — €T,) dt, €=1,...,n 


can be computed using one circuit by feeding the signal X(-) to a matched filter of 
impulse response @ and sampling the output at the times t = £Ts, for 2=1,...,n. 
(In the complex case the matched filter is of impulse response @*.) 


In this chapter we shall address the design of and the limitations on signals that are 
orthogonal to their time-shifts. While our focus so far has been on real functions @, 
for reasons that will become apparent in Chapter 16 when we discuss Quadrature 
Amplitude Modulation, we prefer to generalize the discussion and allow @ to be 
complex. The main results of this chapter are Corollary 11.3.4 and Corollary 11.3.5. 


An obvious way of choosing a signal @ that is orthogonal to its time shifts by 
nonzero integer multiples of T; is by choosing a pulse that is zero outside some 
interval of length Ts, say [—T;/2,1;/2). This guarantees that the pulse and its 
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time shifts by nonzero integer multiples of T; do not overlap in time and that they 
are thus orthogonal. But this choice limits us to pulses of infinite bandwidth, 
because no nonzero bandlimited signal can vanish outside a finite (time) interval 
(Theorem 6.8.2). 

Fortunately, as we shall see, there exist signals that are orthogonal to their time 
shifts and that are also bandlimited. This does not contradict Theorem 6.8.2 
because these signals are not time-limited. They are orthogonal to their time 
shifts in spite of overlapping with them in time. 

Since we have in mind using the pulse to send a very large number of symbols n 
(where n corresponds to the number of symbols sent during the lifetime of the 
system) we shall strengthen the orthonormality requirement to 


ie o(t — €T;) o*(t — 'T,) dt = {2 = @’}, for all integers 2, 0’ (11.1) 


and not only to those ¢,é in {1,...,n}. We shall refer to Condition (11.1) as 
saying that “the time shifts of @ by integer multiples of T; are orthonormal.” 


Condition (11.1) can also be phrased as a condition on @’s self-similarity function, 
which we introduce next. 


11.2 The Self-Similarity Function of Energy-Limited Signals 


We next introduce the self-similarity function of energy-limited signals. This 
term is not standard; more common in the literature is the term “autocorrelation 
function.” I prefer “self-similarity function,” which was proposed to me by Jim 
Massey, because it reduces the risk of confusion with the autocovariance function 
and the autocorrelation function of stochastic processes. There is nothing random 
in our current setup. 


Definition 11.2.1 (Self-Similarity Function). The self-similarity function Ry 
of an energy-limited signal v € Lo is defined as the mapping 


Revie fl uo(t+7)u*(t)dt, 7reER. (11.2) 


If v is real, then the self-similarity function has a nice pictorial interpretation: one 
plots the original signal and the result of shifting the signal by 7 on the same graph, 
and one then takes the pointwise product and integrates over time. 


The main properties of the self-similarity function are summarized in the following 
proposition. 


Proposition 11.2.2 (Properties of the Self-Similarity Function). Let Ry, be the 
self-similarity function of some energy-limited signal v € Lo. 


(i) Value at zero: 


Ryv (0) = a |v(t)|? de. (11.3) 


—Co 
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(ti) Maximum at zero: 
[Rvv(r)| < Rw(0), TER (11.4) 
(itt) Conjugate symmetry: 
Ryw(-T) =Ri,(7), TER (11.5) 
(iv) Integral representation: 
Rv(r) = flo NPem af, TER, (11.6) 
where v is the Lg-Fourier Transform of v. 
(v) Uniform Continuity: Ry is uniformly continuous. 
(vi) Convolution Representation: 
Ryw(7T) =(vxv¥*)(7), TER. (11.7) 


Proof. Part (i) follows by substituting 7 = 0 in (11.2). 


Part (ii) follows by noting that R,,(7) is the inner product between the mapping 
tr v(t+7) and the mapping t + v(t); by the Cauchy-Schwarz Inequality; and by 
noting that both of the above mappings have the same energy, namely, the energy 


of v: 


RoE f v(t +7) v*(t) dé 


< Cs lo(t +7) |? ar) a (f- lw*(t)|? ar) = 


= IIvilp 
— Ryv (0), a ws R. 


Part (iii) follows from the substitution s = ¢ +7 in the following: 


Ryv(T) = i u(t + 7) v* (€) dt 


= Ree) TER. 


Part (iv) follows from the representation of Ryy(7) as the inner product between 
the mapping t +> v(t +7) and the mapping t +> v(t); by Parseval’s Theorem; 
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and by noting that the D2-Fourier Transform of the mapping t +> v(t+ 7) is the 
(equivalence class of the) mapping f +> e?"/7 4(f): 


Ry (7) = ve v(t + 7) u*(t) dt 


—oco 


= (tH v(t +7), t+ v(2)) 
= (f ps eP?tfT Of), fie a(f)) 


=| eI a(f)P df, 7reER. 
Part (v) follows from the integral representation of Part (iv) and from the inte- 


grability of the function f + |6(f)|?. See, for example, the proof of (Katznelson, 
1976, Section VI, Theorem 1.2). 


Part (vi) follows from the substitution s 4 t+ 7 and by rearranging terms: 


Raith) = i v(t + 7) v*(t) dt 


With the above definition we can restate the orthonormality condition (11.1) in 
terms of the self-similarity function Rgg of ¢: 


Proposition 11.2.3 (Shift-Orthonormality and Self-Similarity). Jf @ is energy- 
limited, then the shift-orthonormality condition 


i o(t — (Ts) o*(t-— CTs) dt ==}, 20 EZ (11.8) 


is equivalent to the condition 
Roe (ETs) = I{é = 0}, LEZ. (11.9) 


Proof. The proposition follows by substituting s = t — ¢’/T, in the LHS of (11.8) 
to obtain 


Co 


a b(t — £T.) oe etar= [ $(s+(U — OT.) $"(s) ds 


—oCo 


= Rog ((e — £)Ts). 


At this point, Proposition 11.2.3 does not seem particularly helpful because Con- 
dition (11.9) is not easy to verify. But, as we shall see in the next section, this 
condition can be phrased very elegantly in the frequency domain. 
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11.3 Nyquist’s Criterion 


Definition 11.3.1 (Nyquist Pulse). We say that a complex signal v: Rt C is a 
Nyquist Pulse of parameter T, if 


v(fTs) =1I{=0}, eZ. (11.10) 

Theorem 11.3.2 (Nyquist’s Criterion). Let T; > 0 be given, and let the signal v(-) 
be given by 

v(t) a g(fye?"Ft af, teR, (11.11) 


for some integrable function g: f > g(f). Then v(-) is a Nyquist Pulse of param- 
eter T; if, and only if, 


1/(2Ts) 
lim df =0. (11.12) 
Joo J_1/(2T.) 


Ts — ¥ o(t+4) 


j=-J 


Note 11.3.3. Condition (11.12) is sometimes written imprecisely! in the form 


oS j 1 1 
—)= _ <f< . 
or, in view of the periodicity of the LHS of (11.13), as 
- o(f+2) =T,, feER. (11.14) 


j=-o0 


Neither form is mathematically precise. 


Proof. We will show that v(—@T,) is the ¢th Fourier Series Coefficient of the 


function? 

dt j 1 if 
TE S- ol f+ 2), i (11.15) 
Ss j=—oo Ss s s 


It will then follow that the condition that v is a Nyquist Pulse of parameter T; is 
equivalent to the condition that the function in (11.15) has Fourier Series Coeffi- 
cients that are all zero except for the zeroth coefficient, which is one. The theorem 
will then follow by noting that a function is indistinguishable from a constant if, 
and only if, all but its zeroth Fourier Series Coefficient are zero. (This can be 
proved by applying Theorem A.2.3 with g; chosen as the constant function.) The 


1There is no guarantee that the sum converges at every frequency f. 
2Since, by hypothesis, g is integrable, it follows that the sum in (11.15) converges in the £1 
sense, i.e., that there exists some integrable function soo such that 


J . 
i= dias D)Jar =o. 


j=-J 


1/(2Ts) 
lim s 
Joo J—1/(2Ts) 


By writing Daas olf + 2) we are referring to this function Soo. 
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value of the constant can be computed from the zeroth Fourier Series Coefficient. 
To conclude the proof we thus need to relate v(—T;) to the ¢th Fourier Series 
Coefficient of the function in (11.15). The calculation is straightforward: for every 
integer £, 


ie. . aes 
=), / 2 ae eds 
j=—C” Tg” 27s 
oO 1 
= OP off demir aj 
j=—co YT OT . 
oO 1 
= OP o(F+ Leer af 
j=—co% TOT 
Slee lore) 
= 1: a a Saad 
ats j=—00 8 
ve Sexe } 
=e (Fr x (7+ 2)) VBeare df, (11.16) 
eqe 8 j=—0o : 


which is the ¢-th Fourier Series Coefficient of the function in (11.15). Here the first 
equality follows by substituting —¢T, for ¢ in (11.11); the second by partitioning the 
ecu of ee into intervals of length + +; the third by the change of variable 
fs f — 4; the fourth by the periodicity of the complex exponentials; the fifth by 
Fubini’s t esters which allows us to swap the order summation and integration; 
and the final equality by multiplying and dividing by Ts. 


An example of a function f +> g(f) satisfying (11.12) is plotted in Figure 11.1. 


Corollary 11.3.4 (Characterization of Shift-Orthonormal Pulses). Let 6: R+> C 
be energy-limited and let Ts be positive. Then the condition 


/- o(t — 1.) o*(t —UTs) dt =1HL=L}, 2 EZ (11.17) 


is equivalent to the condition 


a(f+2)[ st. (11.18) 


j=—00 


e., to the condition that the set of frequencies f € R for which the LHS of (11.18) 
is not equal to T, is of Lebesgue measure zero.® 


3It is a simple technical matter to verify that the question as to whether or not (11.18) is 
satisfied outside a set of frequencies of Lebesgue measure zero does not depend on which element 
in the equivalence class of the L2-Fourier Transform of ¢ is considered. 
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Figure 11.1: A function g(-) satisfying (11.12). 
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Proof. By Proposition 11.2.3, Condition (11.17) can be equivalently expressed in 
terms of the self-similarity function as 


Reoeo(mTs) = I{m = 0}, meéeZ. (11.19) 


The result now follows from the integral representation of the self-similarity func- 
tion Rgg (Proposition 11.2.2 (iv)) and from Theorem 11.3.2 (with the additional 
simplification that for every 7 € Z the function f |o( f+ +) |’ is nonnegative, so 
the sum on the LHS of (11.18) converges (possibly to +00) for every f € R). 


An extremely important consequence of Corollary 11.3.4 is the following corollary 
about the minimum bandwidth of a pulse @ satisfying the orthonormality condition 
(11.1). 


Corollary 11.3.5 (Minimum Bandwidth of Shift-Orthonormal Pulses). Let T, > 0 
be fired, and let @ be an energy-limited signal that is bandlimited to W Hz. If the 
time shifts of @ by integer multiples of Ts are orthonormal, then 


1 
W> : 11.2 
— (11.20) 
Equality is achieved if 
a 1 
— < a 
Al=VEHI <p} Fer (11.21) 
and, in particular, by the sinc(-) pulse 
din t 
o(t) = oa sine( zr) teR (11.22) 


or any time-shift thereof. 


Proof. Figure 11.2 illustrates why @ cannot satisfy (11.18) if (11.20) is violated. 
The figure should also convince you of the conditions for equality in (11.20). 


For the algebraically-inclined readers we prove the corollary by showing that if 
W < 1/(2T;), then (11.18) can only be satisfied if @ satisfies (11.21) (outside a set 
of frequencies of Lebesgue measure zero).* To see this, consider the sum 


co . j 2 
S- la(F + 2)| (11.23) 
j=—00 
for frequencies f in the open interval (-s-, +5). The key observation in the 


proof is that for frequencies in this open interval, if W < 1 /(2T;), then all the terms 
in the sum (11.23) are zero, except for the 7 = 0 term. That is, 


Libero i (West (-s+mr) nee!) 


4In the remainder of the proof we assume that df) is zero for frequencies f satisfying | f| > W. 
The proof can be easily adjusted to account for the fact that, for frequencies | f| > W, it is possible 


H(f) 


that H-) be nonzero on a set of Lebesgue measure zero. 
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To convince yourself of (11.24), consider, for example, the term corresponding to 
j = 1, namely, |6(f + 1/T,)|?. By the definition of bandwidth, it is zero whenever 
|f + 1/T;| > W, ie., whenever f > —1/T; + W or f < —1/T; — W. Since the 
former category f > —1/Ts; + W includes—by our assumption that W < 1/(2T;)— 
all frequencies f > —1/(2T;), we conclude that the fee ote pondahg toj=1 
is zero for all the frequencies f in the open interval (- oan » ta ae More generally, 


the j-th term lof + j/Ts)|? is zero for all frequencies f satisfying the condition 
|f+7/Ts| > W, a condition that is satisfied—assuming 7 4 0 and W < 1/(QTs )—by 


the frequencies in the open interval that is of interest to us (- 1% sto) 


For W < 1/(2Ts) we thus obtain from (11.24) that the condition (11.18) implies 
(11.21), and, in particular, that W = 1/(2T;). 


Functions satisfying (11.21) are seldom used in digital communication because they 
typically decay like 1/t so that even if the transmitted symbols X,¢ are bounded, 
the signal X(t) may take on very high values (albeit quite rarely). Consequently, 
the pulses @ that are used in practice have a larger bandwidth than 1/(2Ts). 


This leads to the following definition. 


Definition 11.3.6 (Excess Bandwidth). The excess bandwidth in percent of a 
signal @ relative to I; > 0 is defined as 


(11.25) 


100% Ga of @ ) 


1/(2Ts) 


The following corollary to Corollary 11.3.4 is useful for the understanding of real 
signals of excess bandwidth smaller than 100%. 


Corollary 11.3.7 (Band-Edge Symmetry). Let T; be positive, and let @ be a real 
energy-limited signal that is bandlimited to W Hz, where W < 1/T; so @ is of excess 
bandwidth smaller than 100%. Then the time shifts of @ by integer multiples of Ts 
are orthonormal if, and only if, f — |b(f)|? satisfies the band-edge symmetry 
condition? 


(11.26) 


Proof. We first note that, since we have assumed that W < 1/T;, only the terms 
corresponding to 7 = —1, 7 = 0, and 7 = 1 contribute to the sum on the LHS of 
(11.18) for f € (- aT lanes ae Moreover, since @ is by hypothesis real, it follows 
that |é(—f)| = |4(f)|, so the sum on the LHS of (11.18) is a symmetric function 
of f. Thus, the sum is equal to T, on the interval (a +51) if, and only if, it is 
equal to T; on the interval (0, on) For frequencies in this shorter interval only 
two terms in the sum contribute: those corresponding to 7 = 0 and 7 = —1. We 


5Condition (11.26) should be understood to indicate that the LHS and RHS of (11.26) are 
equal for all frequencies 0 < f < 1/(2Ts) outside a set of Lebesgue measure zero. Again, we 
ignore this issue in the proof and assume that ¢(f) is zero for all | f| > W. 
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Figure 11.2: If W < 1/(2T;), then all the terms of the form |o(f + 2) * are zero 


over the shaded frequencies W < |f| < 1/(2Ts). Thus, for W < 1/(2T,) the sum 
lok f+ 2) |’ cannot be equal to T; at any of the shaded frequencies. 
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Figure 11.3: An example of a choice for |d(-)|? satisfying the band-edge symmetry 
condition (11.26). 


thus conclude that, for real signals of excess bandwidth smaller than 100%, the 
condition (11.18) is equivalent to the condition 


af) *l¢(F—1/T)| HT. O< F< 


oT 
Substituting f’ + oa — f in this condition leads to the condition 
Rfid: ) 
aC “fF ) 
which, in view of the symmetry of lo(-)], is equivalent to 
Afr atk ‘i 
ia) 
ie., to (11.26). 


‘sp-r-d)faw oersg 


‘share fet, vers 3 


Note 11.3.8. The band-edge symmetry condition (11.26) has a nice geometric 
interpretation. This is best seen by rewriting the condition in the form 


Ge ic) ; ~ (ic. : f') 
—— ee” ——————EEEEw 
=a(-f’) =a(f") 


Ts ; 1 
<— 11.2 
ty, O<#Ssr, (127) 


which demonstrates that the band-edge condition is equivalent to the condition 
that the plot of f + |d(f)|? in the interval 0 < f < 1/T, be invariant with 
respect to a 180°-rotation around the point (st, &). In other words, the function 


&: fl b |o(- + fall — & should be anti-symmetric for 0 < f’ < 5}. Le., it 


2 
should satisfy 


a Po 8 axel j 1 
g(-f') = -9(f), Vat Mee 


s 
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Figure 11.4: A plot of f +> |@(f)|? as given in (11.30) with 6 = 0.5. 


Figure 11.3 is a plot over the interval [0,1/Ts) of a mapping f +> |d(f)|? that 
satisfies the band-edge symmetry condition (11.26). 


A popular choice of ¢ is based on the raised-cosine family of functions. For every 
0 <6<1 and every T; > 0, the raised-cosine function is given by the mapping 


Ts if O<|fl< FH, 
ris % (1+ 00s ((If1- 42) fd ye ee, (11.28) 


Oo 


F 1+8 


Choosing @ so that its Fourier Transform is the square root of the raised-cosine 
mapping (11.28) 


f-5) f Re <iis Be (129) 


VE 
o(f) = vey cos (F( 
0 


27s 2Ts = 2T;? 
if [fl > HE, 
results in @ being real with 
Is if Os FS 
BNP = 4 F (1 +008 (SRA - e))) if HE < lel Be, (1130) 


oO 
he 
lear’ 
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eos 
as depicted in Figure 11.4 for G = 0.5. 


Using (11.29) and using the band-edge symmetry criterion (Corollary 11.3.7), it 
can be readily verified that the time shifts of @ by integer multiples of T; are 
orthonormal. Moreover, by (11.29), @ is bandlimited to (1 + 6)/(2T;) Hz. It is 
thus of excess bandwidth 3 x 100%. For every 0 < @ < 1 we have thus found a 
pulse @ of excess bandwidth 6 x 100% whose time shifts by integer multiples of T; 
are orthonormal. 
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—2 Ts _ Ts Ts 2 Ts 


Figure 11.5: The pulse ¢(-) of (11.31) with @ = 0.5 and its self-similarity func- 
tion Roe(-) of (11.32). 


In the time domain 


sin ((1-B)r#) 

og cos (+ Byte) + ar R 11.31 
t)= : | 

O0) = Se 1— (46) oO ia 


s 


with corresponding self-similarity function 


T ) cos(77 /Ts) R. (11.32) 


Roo(7) — sine (7 nas 4 8272/7? , 


The pulse @ of (11.31) is plotted in Figure 11.5 (top) for G = 0.5. Its self-similarity 
function (11.32) is plotted in the same figure (bottom). That the time shifts of @ 
by integer multiples of T; are orthonormal can be verified again by observing that 
Ro@ as given in (11.32) satisfies Rgg(@Ts) = 1{é = 0} for all ¢ € Z. 

Notice also that if #(-) is chosen as in (11.31), then for all 0 < 6 < 1, the pulse (-) 
decays like 1/t?. This decay property combined with the fact that the infinite sum 
ov? converges (Rudin, 1976, Chapter 3, Theorem 3.28) will prove useful in 
Section 14.3 when we discuss the power in PAM. 
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11.4 The Self-Similarity Function of Integrable Signals 


This section is a bit technical and can be omitted at first reading. In it we define 
the self-similarity function for integrable signals that are not necessarily energy- 
limited, and we then compute the Fourier Transform of the so-defined self-similarity 
function. 


Recall that a Lebesgue measurable complex signal v: R — C is integrable if 
Jo, |u| dt < oo and that the class of integrable signal is denoted by £;. For 
such signals there may be 7’s for which the integral in (11.2) is undefined. For 
example, if v is not energy-limited, then the integral in (11.2) will be infinite at 
7 = 0. Nevertheless, we can discuss the self-similarity function of such signals by 
adopting the convolution representation of Proposition 11.2.2 as the definition. We 
thus define the self-similarity function Ryw of an integrable signal v € Ly as 


Rw S vx", veli, (11.33) 


but we need some clarification. Since v is integrable, and since this implies that 
its reflected image V is also integrable, it follows that the convolution in (11.33) is 
a convolution between two integrable signals. As such, we are guaranteed by the 
discussion leading to (5.9) that the integral 


Co Co 
/ u(a) (7 — 0) do =| u(t + 7) u* () dt 
—oo —oo 

is defined for all 7’s outside a set of Lebesgue measure zero. (This set of Lebesgue 
measure zero will include the point 7 = 0 if v is not of finite energy.) For r’s inside 
this set of measure zero we define the self-similarity function to be zero. The value 
zero is quite arbitrary because, irrespective of the value we choose for such 7’s, we 
are guaranteed by (5.9) that the so-defined self-similarity function R,, is integrable 


i |Ruv(7)| dr < |lvlt, ve Lu, (11.34) 


and that its £,-Fourier Transform is given by the product of the £,-Fourier Trans- 
form of v and the £,-Fourier Transform of ¥%*, i.e., 


Rw(f)=l0(NP, (ve Ls, FER). (11.35) 


11.5 Exercises 


Exercise 11.1 (Passband Signaling). Let fo, T; > 0 be fixed. 
(i) Show that a signal x is a Nyquist Pulse of parameter T; if, and only if, the signal 
tr eft x(t) is such a pulse. 
(ii) Show that if x is a Nyquist Pulse of parameter T;, then so is t + cos(2m fot) x(t). 


(iii) If t+ cos(2m fot) x(t) is a Nyquist Pulse of parameter T;, must x also be one? 
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Exercise 11.2 (The Self-Similarity Function of a Delayed Signal). Let u be an energy- 
limited signal, and let the signal v be given by v: t +> u(t—to). Express the self-similarity 
function of v in terms of the self-similarity of u and to. 


Exercise 11.3 (The Self-Similarity Function of a Frequency Shifted Signal). Let u be 
an energy-limited complex signal, and let the signal v be given by v: t+ u(t) e?™ fot for 
some fo € R. Express the self-similarity function of v in terms of fo and the self-similarity 
function of u. 


Exercise 11.4 (A Self-Similarity Function). Compute and plot the self-similarity function 
of the signal t + A(1— |t|/T) I{|t| < T}. 


Exercise 11.5 (Symmetry of the FT of the Self-Similarity Function of a Real Signal). 
Show that if @ is an integrable real signal, then the FT of its self-similarity function is 
symmetric: 


(Roo(f) = Ros ( f), ff R), Bey Brew: 


Exercise 11.6 (The Self-Similarity Function is Positive Definite). Showthat if v is an 


energy-limited signal, n is a positive integer, a1,...,Qn € C, and ti,...,tn € R, then 
n n 
oe ajOz Ryv (t; = te) > 0. 


j=l 0=1 


Hint: Compute the energy in the signal t+ SY"_, aj u(t + ty). 


Exercise 11.7 (Relaxing the Orthonormality Condition). What is the minimal bandwidth 
of an energy-limited signal whose time shifts by even multiples of T; are orthonormal? 
What is the minimal bandwidth of an energy-limited signal whose time shifts by odd 
multiples of T; are orthonormal? 


Exercise 11.8 (A Specific Signal). Let p be the complex energy-limited bandlimited signal 
whose FT p is given by 

"* 2 

af) =T(1-IEF-uiossrs eh, fer. 


(i 
(ii 


(iii 


Plot p(-). 
Is p(-) a Nyquist Pulse of parameter T;? 
Is the real part of p(-) a Nyquist Pulse of parameter T,? 


) 
) 
) 
(iv) What about the imaginary part of p(-)? 


Exercise 11.9 (Nyquist’s Third Criterion). We say that an energy-limited signal ~(-) 
satisfies Nyquist’s Third Criterion if 


(2v+1)Ts/2 1 if = 
/ v(t) dt = oo (11.36) 
(2v—1)Ts /2 0 ifyeZ\ {0}. 


(i) Express the LHS of (11.36) as an inner product between w# and some function gy. 
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(ii) Show that (11.36) is equivalent to 


1 ifv=0, 


an, —i2rfuTs .: _ 
fe vd Pa AS f if v € Z\ {0}. 


(iii) Show that, loosely speaking, a satisfies Nyquist’s Third Criterion if, and only if, 
Y ¥(F- 4) sine(t.f - 3) 
j=—0o : 


is indistinguishable from the all-one function. More precisely, if and only if, 


: 2Ts 
lim 

Jomo J_ 1 
aT, 


J . 
ne = +) sine(Tef = 3)| af =0. 


(iv) What is the FT of the pulse of least bandwidth that satisfies Nyquist’s Third 
Criterion with respect to the baud T;? What is its bandwidth? 


Exercise 11.10 (Multiplication by a Carrier). 


(i) Let u be an energy-limited complex signal that is bandlimited to W Hz, and let 
fo > W be given. Let v be the signal v: t + u(t) cos(27 fot). Express the self- 
similarity function of v in terms of fo and the self-similarity function of u. 


(ii) Let the signal @ be given by @: t + V2cos(2rfct) w(t), where fe > W/2 > 0; 
where 4f.T; is an odd integer; and where w is a real energy-limited signal that 
is bandlimited to W/2 Hz and whose time shifts by integer multiples of (2Ts;) 
are orthonormal. Show that the time shifts of @ by integer multiples of T; are 
orthonormal. 


Exercise 11.11 (The Self-Similarity of a Convolution). Let p and q be integrable signals 
of self-similarity functions Rpp and Rqq. Show that the self-similarity function of their 
convolution p x* q is indistinguishable from Rpp * Raq. 


Chapter 12 


Stochastic Processes: Definition 


12.1 Introduction and Continuous-Time Heuristics 


In this chapter we shall define stochastic processes. Our definition will be general so 
as to include the continuous-time stochastic processes of the type we encountered 
in Section 10.2 and also discrete-time processes. 


In Section 10.2 we saw that since the data bits that we wish to communicate 
are random, the transmitted waveform is a stochastic process. But stochastic 
processes play an important role in Digital Communications not only in modeling 
the transmitted signals: they are also used to model the noise in the system and 
other sources of impairments. 


The stochastic processes we encountered in Section 10.2 are continuous-time pro- 
cesses. We proposed that you think about such a process as a real-valued function 
of two variables: “time” and “luck.” By “luck” we mean the realization of all the 
random components of the system, e.g., the bits to be sent, the realization of the 
noise processes (that we shall discuss later), or any other sources of randomness in 
the system. 


Somewhat more precisely, recall that a probability space is defined as a triplet 
(Q,F,P), where the set 2 is the set of experiment outcomes, the set F is the set 
of events, and where P(-) assigns probabilities to the various events. A measurable 
real-valued function of the outcome is a random variable, and a function of time and 
the experiment outcome is a random process or a stochastic process. A continuous- 
time stochastic process X is thus a mapping 


X:QxR—-—R 
(w,t) > X(w,t). 


If we fix some experiment outcome w € Q, then the random process can be regarded 
as a function of one argument: time. This function is sometimes called a sample- 
path, trajectory, sample-path realization, or a sample function 


X(w,-:): RoR 
tr X(w,t). 
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ta re 9(t — £Ts) 


Figure 12.1: The pulse shape g: t +> (1 —4|t|/T,) I{|t| < T,/4}, and the sample 
function t yee xe 9(t — €T;) when (w_4, £3, 2, ¥-1, £0, 21, £2, 3, 24) = 
ee es ee ee es pe ee 


Similarly, if we fix an epoch t € R and view the stochastic process as a function of 
“luck” only, we obtain a random variable: 


X(t): QR 
wr X(w,t). 


This random variable is sometimes called the value of the process at time t or 
the time-t sample of the process. 


Figure 12.1 shows the pulse shape g: t > (1 —4|t|/Ts) I{|t| < Ts/4} and a sample- 
path of the PAM signal 


X(t)= So Xeg(t- As) (12.1) 


with {X¢} taking value in the set {—-1,+1}. Notice that in this example the 
functions t > g(t — £T;) and th g(t — £’T;) do not “overlap” if ¢ 4 &’. 


Figure 12.2 shows the pulse shape 


1=srle| le <2, 
g:trho Ble a eR (12.2) 
ie lt] > =a, 


and a sample-path of the PAM signal (12.1) for {X,¢} taking value in the set 
{—1,+1}. In this example the mappings t + g(t — £T;) and t + g(t — l’T;) do 
overlap (when @’ € {€—1,¢,£+ 1}). 
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Figure 12.2: The pulse shape g of (12.2) and the trajectory t — ae xe 9(t—£T,) 
for CEA ae Rc me ea Renner e = (ALL 1,41, 1,+1, 1, 1, 1). 


12.2. A Formal Definition 


We next give a formal definition of a stochastic process, which is also called a 
random process, or a random function. 


Definition 12.2.1 (Stochastic Process). A stochastic process (X(t), t€T) is an 
indexed family of random variables that are defined on a common probability space 
(Q,F,P). Here T denotes the indexing set and X(t) (or sometimes X,) denotes 
the random variable indexed by t. 


Thus, X(t) is the random variable to which t € T is mapped. For each t € T 
we have that X(t) is a random variable, i.e., a measurable mapping from the 
experiment outcomes set 2 to the reals.! 


A stochastic process (X(t), t € T) is said to be centered or of zero mean if all 
the random variables in the family are of zero mean, i.e., if for every t € T we have 
E[X(t)] = 0. It is said to be of finite variance if all the random variables in the 
family are of finite variance, i.e., if E[X?(t)] < oo for allt € T. 


The case where the indexing set J comprises only one element is not particularly 
exciting because in this case the stochastic process is just a random variable with 
fancy packaging. Similarly, when 7 is finite, the SP is just a random vector or a 
tuple of random variables in disguise. The cases that will be of most interest are 
enumerated below. 


(i) When the indexing set T is the set of integers Z, the stochastic process is 
said to be a discrete-time stochastic process and in this case it is simply 


1Some authors, e.g., (Doob, 1990), allow for X(t) to take on the values +oo provided that 
at each t € T this occurs with zero probability, but we, following (Loéve, 1963), insist that X(t) 
only take on finite values. 
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a bi-infinite sequence of random variables 
sso Xa, X_1,X0, X71, X2,... 


For discrete-time stochastic processes it is customary to denote the random 
variable to which v € Z is mapped by X, rather than X(v) and to refer to 
X,, as the time-v sample of the process (X,, v € Z). 


(ii) When the indexing set is the set of positive integers N, the stochastic process 
is said to be a one-sided discrete-time stochastic process and it is simply 
a one-sided sequence of random variables 


X1,Xo,... 
Again, we refer to X, as the time-v sample of (Xo ve N). 


(iii) When the indexing set T is the real line R, the stochastic process is said to 
be a continuous-time stochastic process and the random variable X(t) 
is the time-t sample of (X(t), t € R). 


In dealing with continuous-time stochastic processes we shall usually denote the 
process by (X(t), t € R), by X, by X(-), or by (X(é)). The random variable to 
which ¢ is mapped, i.e., the time-t sample of the process will be denoted by X(t). 
Its realization will be denoted by «(¢), and the sample-path of the process by x or 
Discrete-time processes will typically be denoted by (Xi ve Z) or by (X,). 


We shall need only a few results on discrete-time stochastic processes, and those will 
be presented in Chapter 13. Continuous-time stochastic processes will be discussed 
in Chapter 25. 


12.3. Describing Stochastic Processes 


The description of a continuous-time stochastic process in terms of a random vari- 
able (as in Section 10.2), in terms of a finite number of random variables (as in 
PAM signaling), or in terms of an infinite sequence of random variables (as in the 
transmission using PAM signaling of an infinite binary data stream) is particularly 
well suited for describing human-generated stochastic processes or stochastic pro- 
cesses that are generated using a mechanism that we fully understand. We simply 
describe how the stochastic process is synthesized from the random variables. The 
method is less useful when the stochastic process denotes a random signal (such 
as thermal noise or some other interference of unknown origin) that we observe 
rather than generate. In this case we can use measurements and statistical meth- 
ods to analyze the process. Often, the best we can hope for is to be informed 
of the finite-dimensional distributions of the process, a concept that will be 
introduced in Section 25.2. 
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12.4 Additional Reading 


Classic references on stochastic processes to which we shall frequently refer are 
(Doob, 1990) and (Loéve, 1963). We also recommend (Gikhman and Skorokhod, 
1996), (Cramér and Leadbetter, 2004), and (Grimmett and Stirzaker, 2001). For 
discrete-time stochastic processes, see (Pourahmadi, 2001) and (Porat, 2008). 


12.5 Exercises 


Exercise 12.1 (Objects in a Basement). Let 7),72,... be a sequence of positive random 
variables, and let Ni, N2,... be a sequence of random variables taking value in N. Define 


X(t)=S Nj I{t>T7;}, teR. 
j=l 


Draw some sample paths of (X(t), te R). Assume that at time zero a basement is empty 
and that N; denotes the number of objects in the j-th box, which is brought down to the 
basement at time 7;. Explain why you can think of X(t) as the number of objects in the 
basement at time t. 


Exercise 12.2 (A Queue). Let 51, 52,... be a sequence of positive random variables. A 
system is turned on at time zero. The first customer arrives at the system at time Sj 
and the next at time S; + $2. More generally, Customer 7 arrives 5S; minutes after 
Customer (7 — 1). The system serves one customer at a time. It takes the system one 
minute to serve each customer, and a customer leaves the system once it has been served. 
Let X(t) denote the number of customers in the system at time ¢. Express X(t) in terms 
of $1, S2,... Is (X (t), te R) a stochastic process? If so, draw a few of its sample paths. 
Compute Pr [X (0.5) > Oo]. Express your answer in terms of the distribution of 51, S2,... 


Exercise 12.3 (A Continuous-Time Markov SP). A particle is in State Zero at time t = 0. 
It stays in that state for TO seconds and then jumps to State One. It stays in State One 
for To seconds and then jumps back to State Zero, where it stays for TO seconds. In 
general, T° is the duration of the particle’s stay in State Zero on its v-th visit to that 
state. Similarly, “ is the duration of its stay in State One on its v-th visit. Assume 
that Tel, see TO. ae qo, TO), ... are independent with To) being a mean-jio 
exponential and with TMD being a mean-p, exponential for all v € N. 


Let X(t) be deterministically equal to zero for t < 0, and equal to the particle’s state for 
t>0. 
(i) Plot some sample paths of (X(t), t € R). 


(ii) What is the probability that the sample path t + X(w,t) is continuous in the 
interval [0, t)? 


(iii) Conditional on X(t) = 0, where t > 0, what is the distribution of the remaining 
duration of the particle’s stay in State Zero? 


Hint: An exponential RV X has the memoryless property, i.e., that for every s,t > 0 we 
have Pr[X >s+t|X >t] =Pr[X > s}. 
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Exercise 12.4 (Peak Power). Let the random variables (D,;, j € Z) be IID, each taking 
on the values 0 and 1 equiprobably. Let 


X(th=A > (1—2De) 9(- Ts), tER, 


L=—0o 
where A, Ts; > 0 and g: th [{|t| < 3T;/4}. Find the distribution of the random variable 


sup|X (t)|. 


teER 


Exercise 12.5 (Sample-Path Continuity). Let the random variables (Dj, j € Z) be IID, 
each taking on the values 0 and 1 equiprobably. Let 
X(t)=A S~ (1-2De) 9(t-£T.), tER, 
L=—oo 
where A, T,; > 0. Suppose that the function g: R — R is continuous and is zero outside 
some interval, so g(t) = 0 whenever |¢| > T. Show that for every w € 2, the sample-path 
tr X(w,t) is a continuous function of time. 


Exercise 12.6 (Random Sampling Time). Consider the setup of Exercise 12.5, with the 
pulse shape g: t +> (1 — 2|t|/Ts) I{|t] < T,/2}. Further assume that the RV T is in- 
dependent of (DF JE Z) and uniformly distributed over the interval [—6d,6]. Find the 
distribution of X (kT; + T) for any integer k. 


Exercise 12.7 (A Strange SP). Let T be a mean-one exponential RV, and define the SP 


(X(t), te R) by 
Rie f ift=T, 
0 otherwise. 


Compute the distribution of X(t1) and the joint distribution of X(t1) and X(t2) for 
t1,t2 € R. What is the probability that the sample-path t > X(w,t) is continuous at t1? 
What is the probability that the sample-path is a continuous function (everwhere)? 


Exercise 12.8 (The Sum of Stochastic Processes: Formalities). Let the stochastic pro- 
cesses (Xi(t), t € R) and (X2(t), t € R) be defined on the same probability space 
(Q,F,P). Let (Y(t), te R) be the SP corresponding to their sum. Express Y as a 
mapping from Q x R to R. What is Y(w,t) for (w,t) € 2 x R? 


Exercise 12.9 (Independent Stochastic Processes). Let the SP (Xi(t), t € R) be de- 
fined on the probability space (01,71, Pi), and let (Xa(t), te R) be defined on the 
space (Q2, F2, P2). Define a new probability space (0,7, P) with two stochastic processes 
(Xi(t), t © R) and (X(t), t € R) such that for every 7 € N and epochs ti,...,t7 ER 
the following three conditions hold: 
1) The joint law of X1(t1),...,X1(t,) is the same as the joint law of Xi(t1),...,X1(tn). 
2) The joint law of Xo(t1),...,X2(ty) is the same as the joint law of X2(t1),..., X2(tn). 
3) The n-tuple Xi(t1),...,X1(t,) is independent of the n-tuple Xo(t1),...,X2(tn). 


Hint: Consider Q = Q4 Xx Qo. 
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Exercise 12.10 (Pathwise Integration). Let (X;, j € Z) be IID random variables defined 
over the probability space (0,7, P), with X, taking on the values 0 and 1 equiprobably. 
Define the stochastic process (X(t), te R) as 


X(t)= SO X;Ij<t<jt+l}, teR. 


j=-00 


For a given n € N, compute the distribution of the random variable 


wr | X (w, t) dt. 
0 


Chapter 13 


Stationary Discrete-Time Stochastic 
Processes 


13.1 Introduction 


This chapter discusses some of the properties of real discrete-time stochastic pro- 
cesses. Extensions to complex discrete-time stochastic processes are discussed in 
Chapter 17. 


13.2 Stationary Processes 


A discrete-time stochastic process is said to be stationary if all equal-length tuples 
of consecutive samples have the same joint law. Thus: 


Definition 13.2.1 (Stationary Discrete-Time Processes). A discrete-time SP (X_) 
is said to be stationary or strict sense stationary or strongly stationary 
if for every n € N and all integers n,n the joint distribution of the n-tuple 
(Xp,---Xntn—1) is identical to that of the n-tuple (X,,...,Xn'4n—1)* 


(ious Mea) = Ge ea (13.1) 


Here = denotes equality of distribution (law) so X = Y indicates that the random 
variables X and Y have the same distribution; (X,Y) = (W, Z) indicates that the 
pair (X,Y) and the pair (W, Z) have the same joint distribution; and similarly for 
n-tuples. 

By considering the case where n = 1 we obtain that if (X,) is stationary, then the 
distribution of X,, is the same as the distribution of X,,, for all n,7' € Z. That 
is, if (X,) is stationary, then all the random variables in the family (Xo: VE Z) 
have the same distribution: the random variable X, has the same distribution as 
the random variable X2, etc. Thus, 


(XX, v €Z) stationary ) ss (x 2, we Z). (13.2) 
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By considering in the above definition the case where n = 2 we obtain that for a 
stationary process (X,) the joint distribution of X1,X2 is the same as the joint 
distribution of X,,X,+41 for any integer 7. More, however, is true. If (X,) is 
stationary, then the joint distribution of X,, X_, is the same as the joint distribution 
of Xv, Xypu!! 


(x, v EZ) stationary ) => (XX) 2X4 Rac He Z). (13.3) 


To prove (13.3) first note that it suffices to treat the case where v > v’ because 
(X,Y) = (W,Z) if, and only if, (Y,X) = (Z,W). Next note that stationarity 
implies that 

(Oe rere, On bn ©. rey emer, re) (13.4) 
because both are (v — v’ + 1)-length tuples of consecutive samples of the process. 
Finally, (13.4) implies that the joint distribution of (X,/,X_) is identical to the 
joint distribution of (Xy4+.,Xn+v) and (13.3) follows. 
The above argument can be generalized to more samples. This yields the following 


proposition, which gives an alternative definition of stationarity, a definition that 
more easily generalizes to continuous-time stochastic processes. 
Proposition 13.2.2. A discrete-time SP em VE Z) is stationary if, and only if, 
for every n EN, all integers 11,...,Un € Z, and every n € Z 

ZF 


(Me sectey MS Ons os (13.5) 


Proof. One direction is trivial and simply follows by substituting consecutive in- 
tegers for 1,...,V, in (13.5). The proof of the other direction is a straightforward 
extension of the argument we used to prove (13.3). 


By noting that (Wi,...,Wn) = (Zi,-..,Zn) if, and only if,! 7, ajW; =; a5 Z; 
for all ay,...,Q@p, € R we obtain the following equivalent characterization of sta- 
tionary processes: 


Proposition 13.2.3. A discrete-time SP (X,) is stationary if, and only if, for every 
neéEN, allyn,4,...,%m € Z, and all ay,...,a,€R 


Seg, SS ay ky (13.6) 
j=l j=l 


13.3. Wide-Sense Stationary Stochastic Processes 


Definition 13.3.1 (Wide-Sense Stationary Discrete-Time SP). We say that a 
discrete-time SP (Xp; ve Z) is wide-sense stationary (WSS) or weakly 


1 This follows because the multivariate characteristic function determines the joint distribution 
(see Proposition 23.4.4 or (Dudley, 2003, Chapter 9, Section 5, Theorem 9.5.1)) and because 
the characteristic functions of all the linear combinations of the components of a random vector 
determine the multivariate characteristic function of the random vector (Feller, 1971, Chapter XV, 
Section 7). 
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stationary or covariance stationary or second-order stationary or weak- 
sense stationary if the following three conditions are satisfied: 


1) The random variables X,, v € Z are all of finite variance: 


Var[X,]< oo, VEZ. (13.7a) 


2) The random variables X,, v € Z have identical means: 


ELX,])=E[Xi], v eZ. (13.7b) 


3) The quantity E[X,X,] depends on v/ and v only via v —v': 
E[Xy X,] = E[Xpiy Xniv], ¥,0/,7 € Z. (13.7c) 


Note 13.3.2. By considering (13.7c) when v = v’ we obtain that all the samples 
of a WSS SP have identical second moments. And since, by (13.7b), they also all 
have identical means, it follows that all the samples of a WSS SP have identical 
variances: 


(x, v €Z) Wss) ae (Varlx:] =Var[Xi], ve Z). (13.8) 
An alternative definition of a WSS process in terms of the variance of linear func- 
tionals of the process is given below. 


Proposition 13.3.3. A finite-variance discrete-time SP (X,) is WSS if, and only 


if, for everyn EN, every n,,.--,Un € Z, and every aj,...,Qn €R 
nm n 
S- a;X,, and S- a;X,,+n have the same mean & variance. (13.9) 
j=l jal 


Proof. The proof is left as an exercise. Alternatively, see the proof of Proposi- 
tion 17.5.5. 


13.4 Stationarity and Wide-Sense Stationarity 


Comparing (13.9) with (13.6) we see that, for finite-variance stochastic processes, 
stationarity implies wide-sense stationarity, which is the content of the following 
proposition. This explains why stationary processes are sometimes called strong- 
sense stationary and why wide-sense stationary processes are sometimes called 
weak-sense stationary. 


Proposition 13.4.1 (Finite-Variance Stationary Stochastic Processes Are WSS). 
Every finite-variance discrete-time stationary SP is WSS. 


Proof. While this is obvious from (13.9) and (13.6) we shall nevertheless give an 
alternative proof because the proof of Proposition 13.3.3 was left as an exercise. The 
proof is straightforward and follows directly from (13.2) and (13.3) by noting that if 


X =Y, then E[X] = E[Y] and that if (X,Y) = (W, Z), then E[XY] = E[WZ]. 
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It is not surprising that not every WSS process is stationary. Indeed, the definition 
of WSS processes only involves means and covariances, so it cannot possibly say 
everything regarding the distribution. For example, the process whose samples 
are independent with the odd ones taking on the value +1 equiprobably and with 
the even ones uniformly distributed over the interval [—/3, +/3] is WSS but not 
stationary. 


13.5 The Autocovariance Function 


Definition 13.5.1 (Autocovariance Function). The autocovariance function 
Kxx: Z—R of a WSS discrete-time SP (<3) is defined by 


Kxx(m) = Cov[Xr4n,X,], 1 €Z. (13.10) 


Thus, the autocovariance function at 7 is the covariance between two samples of 
the process taken 7 units of time apart. Note that because (X,) is WSS, the RHS 
of (13.10) does not depend on v. Also, for WSS processes all samples are of equal 
mean (13.7b), so 


Kxx() = Cov[X147, X71] 
= E[Xy4)Xv] — E[Xv4y] EX) 
=E[Xv1,X.]—(E[Xi])’, 1 eZ. 


In some engineering texts the autocovariance function is called “autocorrelation 
function.” We prefer the former because Kxx (7) does not measure the correlation 
coefficient between X, and X,+, but rather the covariance. These concepts are 
different also for zero-mean processes. Following (Grimmett and Stirzaker, 2001) 
we define the autocorrelation function of a WSS process of nonzero variance as 


A Cov[Xi4n, Xy] 


pxx (7) a Var[X1] » 7) Z, (13.11) 


ie., as the correlation coefficient between X,,, and X,. (Recall that for a WSS 
process all samples are of the same variance (13.8), so for such a process the 
denominator in (13.11) is equal to \/Var[X,] Var[X7+n]-) 


Not every function from the integers to the reals is the autocovariance function of 
some WSS SP. For example, the autocovariance function must be symmetric in the 
sense that 


Kxx(—7) =Kxx(n), 1 €2Z, (13.12) 
because, by (13.10), 


Kxx(n) = Cov[Xp4n; Xv] 
= Cov[X5, Xp—n] 
= Cov[Xp_—n, Xo] 
= Kxx(-n), n€Z, 
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where in the second equality we defined  & v + n, and where in the third equal- 
ity we used the fact that for real random variables the covariance is symmetric: 
Cov[X, Y] = Cov[Y, X]. 


Another property that the autocovariance function must satisfy is 


S75 avay: Kxx(y-v/) 20, a1,...,0n ER, (13.13) 


v=1v’/=1 


because 


3 3 ayay Kxx(v —v') = s s aya, Cov[X,, Xp] 


v=lv’=1 v=1v/=1 
= Cov] 5 ay Xp, S- ayy | 
v=1 v/=1 
= Var] 3a, 
v=1 
> 0. 


It turns out that (13.12) and (13.13) fully characterize the autocovariance functions 
of discrete-time WSS stochastic processes in a sense that is made precise in the 
following theorem. 


Theorem 13.5.2 (Characterizing Autocovariance Functions). 


(i) If Kxx is the autocovariance function of some discrete-time WSS SP (X,), 
then Kxx must satisfy (13.12) & (13.13). 


(ii) If K: ZR is some function satisfying 
K(—) =K(m), €Z (13.14) 
and 
So SS ay K(y —) > 0, (neN, 1.145 ER), (13.15) 
v=lv/=1 
then there exists a discrete-time WSS SP (X,) whose autocovariance func- 


tion Kxx is given by Kxx(n) = K(n) for all n € Z. 


Proof. We have already proved Part (i). For a proof of Part (ii) see, for example, 
(Doob, 1990, Chapter X, § 3, Theorem 3.1) or (Pourahmadi, 2001, Theorem 5.1 in 
Section 5.1 and Section 9.7).? 


A function K: Z — R satisfying (13.14) & (13.15) is called a positive definite 
function. Such functions have been extensively studied in the literature, and in 
Section 13.7 we shall give an alternative characterization of autocovariance func- 
tions based on these studies. But first we introduce the power spectral density. 


?For the benefit of readers who have already encountered Gaussian stochastic processes, we 
mention here that if K(-) satisfies (13.14) & (13.15) then we can even find a Gaussian SP whose 
autocovariance function is equal to K(-). 
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13.6 The Power Spectral Density Function 


Roughly speaking, the power spectral density (PSD) of a discrete-time WSS 
SP (ey) of autocovariance function Kxx is an integrable function on the interval 
[—1/2,1/2) whose 7-th Fourier Series Coefficient is equal to Kxx(7). Such a func- 
tion does not always exist. When it does, it is unique in the sense that any two such 
functions can only differ on a subset of the interval [—1/2, 1/2) of Lebesgue measure 
zero. (This follows because integrable functions on the interval [—1/2,1/2) that 
have identical Fourier Series Coefficients can differ only on a subset of [—1/2, 1/2) 
of Lebesgue measure zero; see Theorem A.2.3.) Consequently, we shall speak of 
“the” PSD but try to remember that this does not always exist and that, when it 
does, it is only unique in this restricted sense. 


Definition 13.6.1 (Power Spectral Density). We say that the discrete-time WSS 
SP (X,) is of power spectral density Sxx if Sxx is an integrable mapping 
from the interval [—1/2,1/2) to the reals such that 


1/2 

Kxx (7) = ‘) Sxx (8) e 2nd dé, ne Z. (13.16) 
1/2 

But see also Note 13.6.5 ahead. 


Note 13.6.2. We shall sometimes abuse notation and, rather than say that the 
stochastic process (xy VE Z) is of PSD Sxx, we shall say that the autocovariance 
function Kxx is of PSD Sxyyx. 


By considering the special case of 7 = 0 in (13.16) we obtain that 


Var[X_] = Kxx (0) 


1/2 
= / Sxx (6) dé, VEZ. (13.17) 
-1/2 


The main result of the following proposition is that power spectral densities are 
nonnegative (except possibly on a set of Lebesgue measure zero). 


Proposition 13.6.3 (PSDs Are Nonnegative and Symmetric). 


(i) If the WSS SP (Xo VE Z) of autocovariance Kxx is of PSD Sxx, then, 
except on subsets of (—1/2,1/2) of Lebesgue measure zero, 


Sxx (0) > 0 (13.18) 


and 
Sxx (8) = Sxx(—8). (13.19) 


(ti) If the function S: [—-1/2,1/2) > R is integrable, nonnegative, and symmetric 
(in the sense that S(0) = S(—6) for all 6 € (—1/2,1/2)), then there exists a 
WSS SP (X,) whose PSD Sxx is given by 


Sx O=S(0), Celis, 
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Proof. The nonnegativity of the PSD (13.18) will be established later in the more 
general setting of complex stochastic processes (Proposition 17.5.7 ahead). Here we 
only prove the symmetry (13.19) and establish the second half of the proposition. 


That (13.19) holds (except on a set of Lebesgue measure zero) follows because Kx x 
is symmetric. Indeed, for any 7 € Z we have 


1/2 ; 
i (Sxx (8) — Sxx(—8)) e~?7" dO 


1/2 


1/2 ; 1/2 ; 
= / Sxx (0) e727? do — / Sxx(—0) e729 d@ 


-1/2 


= Kxx(7) — Kxx(—n) 
=0, eZ. (13.20) 


Consequently, all the Fourier Series Coefficients of the function 6 > Sxx(0) — 
Sxx(—@) are zero, thus establishing that this function is zero except on a set of 
Lebesgue measure zero (Theorem A.2.3). 


We next prove that if the function S: [-1/2,1/2) — R is symmetric, nonnegative, 
and integrable, then it is the PSD of some WSS real SP. We cheat a bit because 
our proof relies on Theorem 13.5.2, which we never proved. From Theorem 13.5.2 
it follows that it suffices to establish that the sequence K: Z — R defined by 


1/2 ; 
K(n) = : S(0) e279 do, nEZ (13.21) 
—1/2 


satisfies (13.14) & (13.15). 
Verifying (13.14) is straightforward: by hypothesis, S(-) is symmetric so 


1/2 : 
K(-n) = f S(0) e~27(—m® ag 


1/2 
1/2 
=f sipB dp 
1/2 
1/2 
=f See? dp 
1/2 
=K(n), 7 €Z, 


where the first equality follows from (13.21); the second from the change of variable 
y = —6; the third from the symmetry of S(-), which implies that S(—y) = S(); 
and the last equality again from (13.21). 


We next verify (13.15). To this end we fix arbitrary a1,...,a@, € R and compute 


non no 1/2 


Se S- Ay ap! K(v a ') = es S- aay | S(0) eT i2n(v—v')6 dé 


v=1v/=1 v=lv/=1 —1/2 
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/ 
- fs (SE avane iQn(v— “) ao 


v=1v/=1 
= [vse a(> S- 3 Oy e728 y bi ce) dé 
a v=1lv/=1 
= ae o(oa. ad (> S- Qyt oe) dé 
_ v=1 yi=1 
= A s(0)| Soa e278 ; dé 
>0, (13.22) 


where the first equality follows from (13.21); the subsequent equalities by simple 
algebraic manipulation; and the final inequality from the nonnegativity of S(-). 


Corollary 13.6.4. Jf a discrete-time WSS SP (X,) has a PSD, then it also has a 
PSD Sxx for which (13.18) holds for every 0 € [—1/2,1/2) and for which (13.19) 
holds for every 0 € (—1/2,1/2) (and not only outside subsets of Lebesgue measure 
zero). 


Proof. Suppose that (X,) is of PSD Sxx. Define the mapping S: [—1/2,1/2) > R 
by? 


s(0) = aa +|Sxx(-8))) £8 (-1/2,1/2) ag 9 
1 if @= —-1/2. 


By the proposition, Syx and S(-) differ only on a set of Lebesgue measure zero, 
so they must have identical Fourier Series Coefficients. Since the Fourier Series 
Coefficients of Sxx agree with Kxx, it follows that so must those of S(-). Thus, S(-) 
is a PSD for (X,), and it is by (13.23) nonnegative on [—1/2, 1/2) and symmetric 

n (—1/2,1/2). 


Note 13.6.5. In view of Corollary 13.6.4 we shall only say that (X,) is of PSD Syv 
if the function Syy—in addition to being integrable and to satisfying (13.16)—is 
also nonnegative and symmetric. 


As we have noted, not every WSS SP has a PSD. For example, the process defined 
by 
XX,» =X, veZ, 


where X is some zero-mean unit-variance random variable has the all-one auto- 
covariance function Kxx(7) = 1, 7 € Z, and this all-one sequence cannot be 
the Fourier Series Coefficients sequence of an integrable function because, by the 
Riemann-Lebesgue lemma (Theorem A.2.4), the Fourier Series Coefficients of an 
integrable function must converge to zero.* 


3Our choice of $(—1/2) as 1 is arbitrary; any nonnegative value whould do. 

4One could say that the PSD of this process is Dirac’s Delta, but we shall refrain from doing 
so because we do not use Dirac’s Delta in this book and because there is not much to be gained 
from this. (There exist processes that do not have a PSD even if one allows for Dirac’s Deltas.) 
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In general, it is very difficult to characterize the autocovariance functions having 
a PSD. We know by the Riemann-Lebesgue lemma that such autocovariance func- 
tions must tend to zero, but this necessary condition is not sufficient. A very useful 
sufficient (but not necessary) condition is the following: 


Proposition 13.6.6 (PSD when Kxx Is Absolutely Summable). Jf the autoco- 
variance function Kxx is absolutely summable, 1.e., 


S> |Kxx(m)| < 00, (13.24) 
N=—oco 
then the function 
S(0)= S> Kxx(n)e?™, 6 € [-1/2,1/2| (13.25) 


n=—0Co 


is continuous, symmetric, nonnegative, and satisfies 


1/2 ; 
‘ S(0) e?*"9 dd = Kxx(n), 1 €Z. (13.26) 
—1/2 


Consequently, S(-) is a PSD for Kxx. 


Proof. First note that because |Kxx (7) e7?7"| = |Kxx(n)|, it follows that (13.24) 
guarantees that the sum in (13.25) converges uniformly and absolutely. And since 
each term in the sum is a continuous function, the uniform convergence of the 
sum guarantees that S(-) is continuous (Rudin, 1976, Chapter 7, Theorem 7.12). 
Consequently, 

1/2 
i 1S(8)| d0 < 00, (13.27) 


1/2 
and it is meaningful to discuss the Fourier Series Coefficients of S(-). 


We next prove that the Fourier Series Coefficients of S(-) are equal to Kxx, ie., 
that (13.26) holds. This can be shown by swapping integration and summation 
and using the orthonormality property 


1/2 | ; 
‘ entra)? da=Hn=1}, ni eZ (13.28) 
-1/2 
as follows: 

1/2 1/2 oo AR? 

/ 5() cP"? ag = i ( S> Kxx(n') e2" ’) e-2anl gg 

—1/2 S12. pi 5s 

&2. 1/2 


‘ , . 
ei2™ 0 e 2nd dg 


lI 
4 
z 
x 
=y 
Za 


ei2n(n'—n) Gag 


lI 
hes 
< 
x 
=a 
sy 


1 —1/2 
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= SO Kxx(n) In! =n} 


n!=—0o 


=Kxx(n), 7 €Z. 


It remains to show that S(-) is symmetric, i.e., that S$(@) = S(—@), and that it is 
nonnegative. The symmetry of S(-) follows directly from its definition (13.25) and 
from the fact that Kyx, like every autocovariance function, is symmetric (Theo- 
rem 13.5.2 (i)). 


We next prove that S(-) is nonnegative. From (13.26) it follows that S(-) can 
only be negative on a subset of the interval [—1/2,1/2) of Lebesgue measure zero 
(Proposition 13.6.3 (i)). And since S(-) is continuous, this implies that S(-) is 
nonnegative. 


13.7 The Spectral Distribution Function 


We next briefly discuss the case where CX) does not necessarily have a power 
spectral density function. We shall see that in this case too we can express the 
autocovariance function as the Fourier Series of “something,” but this “something” 
is not an integrable function. (It is, in fact, a measure.) The theorem will also yield 
a characterization of nonnegative definite functions. The proof, which is based on 
Herglotz’s Theorem, is omitted. The results of this section will not be used in 
subsequent chapters. 


Recall that a random variable taking value in the interval [—a,a] is said to be 
symmetric (or to have a symmetric distribution) if Pr[X < —é] = Pr[X > €] for 
all € € [-a, al. 


Theorem 13.7.1. A function p: Z — R is the autocorrelation function of a real 
WSS SP if, and only if, there exists a symmetric random variable © taking value 
in the interval [—1/2,1/2] such that 


p(n) =Ele"?™"], 9 €Z. (13.29) 
The cumulative distribution function of O is fully determined by p. 


Proof. See (Doob, 1990, Chapter X, § 3, Theorem 3.2), (Pourahmadi, 2001, The- 
orem 9.22), (Shiryaev, 1996, Chapter VI, § 1.1), or (Porat, 2008, Section 2.8). 


This theorem also characterizes autocovariance functions: a function K: Z — R 
is the autocovariance function of a real WSS SP if, and only if, there exists a 
symmetric random variable © taking value in the interval [—1/2,1/2] and some 
constant a > 0 such that 


K(n) =aE[e?"™79], neZ. (13.30) 


(By equating (13.30) at 7 = 0 we obtain that a = K(0), i.e., the variance of the 
stochastic process.) 
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Equivalently, we can state the theorem as follows. If (X za) is a real WSS SP, then 
its autocovariance function Kxx can be expressed as 


Kxx(n) = Var[Xi] E[e"?7"°|, eZ (13.31) 


for some random variable © taking value in the interval [—1/2,1/2] according to 
some symmetric distribution. If, additionally, Var[X,] > 0, then the cumulative 
distribution function Fo(-) of © is uniquely determined by Kxx. 


Note 13.7.2. 


(i) If the random variable O above has a symmetric density fo(-), then the 
process is of PSD @ + Var[X1] fo(0). Indeed, by (13.31) we have for every 
integer 7 


Kxx(n) = Var[X)] Beene 


1/2 ; 
= Var[X4] / fo(0) e— 2" do 
—1/2 


te : iss (Varlx1] fo(0)) e249, 


-1/2 


(ii) Some authors, e.g., (Grimmett and Stirzaker, 2001) refer to the cumulative 
distribution function F(-) of 0, i.e., to the mapping 6 +> Pr[O < 9], as 
the Spectral Distribution Function of (X,). This, however, is not stan- 
dard. It is only in agreement with the more common usage in the case where 
Var[X4] = 1.° 


13.8 Exercises 
Exercise 13.1 (Discrete-Time WSS Stochastic Processes). Prove Proposition 13.3.3. 


Exercise 13.2 (Mapping a Discrete-Time Stationary SP). Let (X.) be a stationary 
discrete-time SP, and let g: R — R be some arbitrary (Borel measurable) function. For 
every v € Z, let Y, = g(X_). Prove that the discrete-time SP (Y,) is stationary. 


Exercise 13.3 (Mapping a Discrete-Time WSS SP). Let (X_) be a WSS discrete-time 
SP, and let g: R — R be some arbitrary (Borel measurable) bounded function. For every 
v € Z, let Y, = g(X_). Must the SP Os) be WSS? 


Exercise 13.4 (A Sliding-Window Mapping of a Stationary SP). Let (X,) be a stationary 
discrete-time SP, and let g: R? — R be some arbitrary (Borel measurable) function. For 
every v € Z define Y, = g(X.-1, X_). Must (Y) be stationary? 


5The more common definition is that 6 ++ Var[X1] Pr[@ < 6] is the spectral measure or 
spectral distribution function. But this is not a distribution function in the probabilistic sense 
because its value at 9 = oo is Var[X1] which may be different from one. 
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Exercise 13.5 (A Sliding-Window Mapping of a WSS SP). Let (X_) be a WSS discrete- 
time SP, and let g: R? — R be some arbitrary bounded (Borel measurable) function. For 
every v € Z define Y, = g(Xv-1, Xv). Must (Y) be WSS? 


Exercise 13.6 (Existence of a SP). For which values of a, 3 € R is the function 


1 ifm=0, 

a ifm=1 
Kxx(m) = ; meZ 
aaa Yate eee 


0 otherwise, 


the autocovariance function of some WSS SP (Xa, VE Z)? 


Exercise 13.7 (Dilating a Stationary SP). Let (X_) be a stationary discrete-time SP, and 
define Y, = Xo, for every v € Z. Must (Ys) be stationary? 


Exercise 13.8 (Inserting Zeros Periodically). Let (X_) be a stationary discrete-time SP, 
and let the RV U be independent of it and take on the values 0 and 1 equiprobably. Define 
for every v € Z 


if v is odd 
ee ee eden and Z, =Yiau. (13.32) 
X,j2 if v is even 


Under what conditions is (Ys) stationary? Under what conditions is (Z.) stationary? 


Exercise 13.9 (The Autocovariance Function of a Dilated WSS SP). Let (X,) be a WSS 
discrete-time SP of autocovariance function Kxx. Define Y, = Xe, for every v € Z. Must 
(e) be WSS? If so, express its autocovariance function Kyy in terms of Kxx. 


Exercise 13.10 (Inserting Zeros Periodically: the Autocovariance Function). Let (X_) be 
a WSS discrete-time SP of autocovariance function Kxx, and let the RV U be independent 
of it and take on the values 0 and 1 equiprobably. Define (Z.) as in (13.32). Must (Z.) 
be WSS? If yes, express its autocovariance function in terms of Kxx. 


Exercise 13.11 (Stationary But Not WSS). Construct a discrete-time stationary SP that 
is not WSS. 


Exercise 13.12 (Complex Coefficients). Show that (13.13) will hold for complex numbers 
Q1,...,Q@n provided that we replace the product a,a,- with a,a*,. That is, show that if 
Kxx is the autocovariance function of a real discrete-time WSS SP, then 


n n 
S- S- ava Kxx(y—v')>0, ar,...,an €C. 


v=1lv/=1 


Chapter 14 


Energy and Power in PAM 


14.1 Introduction 


Energy is an important resource in Digital Communications. The rate at which 
it is transmitted—the “transmit power”—is critical in battery-operated devices. 
In satellite applications it is a major consideration in determining the size of the 
required solar panels, and in wireless systems it influences the interference that one 
system causes to another. In this chapter we shall discuss the power in PAM signals. 
To define power we shall need some modeling trickery which will allow us to pretend 
that the system has been operating since “time —oo” and that it will continue 
to operate indefinitely. Our definitions and derivations will be mathematically 
somewhat informal. A more formal account for readers with background in Measure 
Theory is provided in Section 14.6. 


Before discussing power we begin with a discussion of the expected energy in trans- 
mitting a finite number of bits. 


14.2. Energy in PAM 


We begin with a seemingly completely artificial problem. Suppose that K inde- 
pendent data bits D,,...,Dx, each taking on the values 0 and 1 equiprobably, 
are mapped by a mapping enc: {0,1}* — RN to an N-tuple of real numbers 
(Xy,...,XN), where X, is the th component of the N-tuple enc(Dy, a ., Dx). 
Suppose further that the symbols X;,..., Xj; are then mapped to the waveform 


N 
X(t)=AS_X,g(t- 41), tER, (14.1) 
f=1 


where g € £g is an energy-limited real pulse shape, A > 0 is a scaling factor, and 
T; > 0 is the baud period. We seek the expected energy in the waveform X(-). 


We assume that X(-) corresponds to the voltage across a unit-load or to the current 
through a unit-load, so the transmitted energy is the time integral of the mapping 
t ++ X(t). Because the data bits are random variables, the signal X(-) is a 
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stochastic process. Its energy [° X?(t) dt is thus a random variable.! If (Q, F, P) 
is the probability space under consideration, then this RV is the mapping from 2 


to R defined by 
Wh / X?(w,t) dt 


This RV’s expectation—the expected energy— is denoted by E and is given by 


ES E| f x*warl, (14.2) 


Note that even though we are considering the transmission of a finite number of 
symbols (N), the waveform X(-) may extend in time from —co to +00. 


We next derive an explicit expression for E. Starting from (14.2) and using (14.1), 


E= E| [xa 


SS Xe 9(t - me 


N 
CON f=1 fi=1 
co N N 
= Ate! [ SOY) XXeg (t — £T,) aaa 
CO f= 1 &/=1 
co N N 
=a f S25 E[XeXe] 9(t — £15) 9(t — CTs) dt 
CO f=1 #/=1 
N N foe) 
=A SOY XX] | (t — £T,) 9(t — Ts) d 
l=1 &=1 =e, 
N N 
=APS 7S EX eX] Reg ((E- Ts), (14.3) 
f=1 #/=1 


where Rgg is the self-similarity function of the pulse g(-) (Section 11.2). Here the 
first equality follows from (14.2); the second from (14.1); the third by writing the 
square of a number as its product with itself (€? = €€); the fourth by writing the 
product of sums as the double sum of products; the fifth by swapping expectation 
with integration and by the linearity of expectation; the sixth by swapping integra- 
tion and summation; and the final equality by the definition of the self-similarity 
function (Definition 11.2.1). 


Using Proposition 11.2.2 (iv) we can also express Rgg as 


Ree (7) =| a(f)|’e?"47 df, reER (14.4) 


—oco 


lThere are some slight measure-theoretic mathematical technicalities that we are sweeping 
under the rug. Those are resolved in Section 14.6. 
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and hence rewrite (14.3) as 


oo N N 
E= a f Sy Ea eee a(f)|" af. (14.5) 
TOO CHV SI 
We define the energy per bit as 
ener E 
eRe] aE 140 


and the energy per real symbol as 


energy Kal 
Es a. 14. 
| real symbol | N Co 


As we shall see in Section 14.5.2, if infinite data are transmitted using the binary- 
to-reals (K, N) block encoder enc(-), then the resulting transmitted power P is given 
by 


E 
P=—. 14.8 
T (14.8) 


This result will be proved in Section 14.5.2 after we carefully define the average 
power. The units work out because if we think of T; as having units of seconds per 
real symbol then: 


energy 


Es Ee ot] = Es [| 
second] * 


(14.9) 


A second Ts 


real symbol 


Expression (14.3) for the expected energy E is greatly simplified in two cases that 
we discuss next. The first is when the pulse shape g satisfies the orthogonality 
condition 


ia g(t) 9(t — KTs) dt = |lg||3 [{« =0}, «© {0,1,...,.N—1}. (14.10) 


—Co 


In this case (14.3) simplifies to 


N 
E=A? llgll3 S- E[X?], ({¢ re g(t — Ley ni orthogonal). (14.11) 
(=1 


(In this case one need not even go through the calculation leading to (14.3); the 
result simply follows from (14.1) and the Pythagorean Theorem (Theorem 4.5.2).) 


The second case for which the computation of E is simplified is when the distribu- 
tion of D,,..., Dx and the mapping enc(-) result in the real symbols Xy,...,XN 
being of zero mean and uncorrelated:? 


E[X] =0, €€ {1,...,N} (14.12a) 


? Actually, it suffices that (14.12b) hold; (14.12a) is not needed. 
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and 
Bl XpXe | = BAG |e a et, Oe eft Nb (14.12b) 


In this case too (14.3) simplifies to 


N 
E =A? |lgll2 S- Els ((X, £€ Z) zero-mean & uncorrelated ). (14.13) 
t=1 


14.3 Defining the Power in PAM 


If (X(t), t € R) is a continuous-time stochastic process describing the voltage 
across a unit-load or the current through a unit-load, then it is reasonable to 
define the power P in (X(t), t € R) as the limit 


ise.) Meus he aes 
PS lim = ef x (t) ai. (14.14) 
But there is a problem. Over its lifetime, a communication system is only used 
to transmit a finite number of bits, and it only sends a finite amount of energy. 
Consequently, if (X(t), t € R) corresponds to the transmitted waveform over the 
system’s lifetime, then P as defined in (14.14) will always end up being zero. The 
definition in (14.14) is thus useless when discussing the transmission of a finite 
number of bits. 


To define power in a useful way we need some modeling trickery. Instead of thinking 
about the encoder as producing a finite number of symbols, we should now pretend 
that the encoder produces an infinite sequence of symbols (Xe, LE Z), which are 
then mapped to the infinite sum 


X(t)=A 3 ety PR (14.15) 


L=—0o 


For the waveform in (14.15), the definition of P in (14.14) makes perfect sense. 
Philosophically speaking, the modeling trickery we employ corresponds to mea- 
suring power on a time scale much greater than the signaling period T; but much 
shorter than the system’s lifetime. 


But philosophy aside, there are still two problems we must address: how to model 
the generation of the infinite sequence (Xx 2, €€ Z), and how to guarantee that 
the sum in (14.15) converges for every t € R. We begin with the latter. If g is of 
finite duration, then at every epoch t € R only a finite number of terms in (14.15) 
are nonzero and convergence is thus guaranteed. But we do not want to restrict 
ourselves to finite-duration pulse shapes because those, by Theorem 6.8.2, cannot 
be bandlimited. Instead, to guarantee convergence, we shall assume throughout 
that the following conditions both hold: 


1) The symbols (Xe, LE Z) are uniformly bounded in the sense that there 
exists some constant y such that 


Xe] <7, ¢eZ. (14.16) 
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Dix4i, +++ ,Do, Di, +: Dx, Dey, +++ , Dox 
pao jee) pee 
,X—N41, se) Xo, Xt, XN, XN4G1, +1: , Xen, 


enc(D_x41,-.-,Do) enc(Di,...,Dk)  enc(Dx41,..., Dek) 


Figure 14.1: Bi-Infinite Block Encoding. 


2) The pulse shape t + g(t) decays faster than 1/t in the sense that there exist 
positive constants a, 3 > 0 such that 


Ol STF nate R. (14.17) 


Using the fact that the sum Den Os giles converges whenever a > 0 (Rudin, 
1976, Theorem 3.28), it is not difficult to show that if both (14.16) and (14.17) 
hold, then the infinite sum (14.15) converges at every epoch t € R. 


As to the generation of (Xe, LE Z), we shall consider three scenarios. In the 
first, which we analyze in Section 14.5.1, we ignore this issue and simply assume 
that (Xe, Le Z) is a WSS discrete-time SP of a given autocovariance function. 
In the second scenario, which we analyze in Section 14.5.2, we tweak the block- 
encoding mode that we introduced in Section 10.4 to account for a bi-infinite data 
sequence. We call this tweaked mode bi-infinite block encoding and describe 
it more precisely in Section 14.5.2. It is illustrated in Figure 14.1. Finally, the 
third scenario, which we analyze in Section 14.5.3, is similar to the first except 
that we relax some of the statistical assumptions on (Xe, Le Z). But we only 
treat the case where the time shifts of the pulse shape by integer multiples of T; 
are orthonormal. 


Except in the third scenario, we shall only analyze the power in the stochastic 
process (14.15) assuming that the symbols (X~, ¢ € Z) are of zero mean 


E[x)=0, @eZ. (14.18) 


This not only simplifies the analysis but also makes engineering sense, because it 
guarantees that (X(t), t € R) is centered 


E[x(t)]=0, teR, (14.19) 


and, for the reasons that we outline in Section 14.4, transmitting zero-mean wave- 
forms is usually power efficient. 
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Ds; = D* 
{Dj} ae, EB Y X+N py, De 


Figure 14.2: The above two systems have identical performance. In the former 
the transmitted power is the power in t +> X(t) whereas in the second it is the 
power intr X(t) — c(t). 


14.4 On the Mean of Transmitted Waveforms 


We next explain why the transmitted waveforms in digital communications are 
usually designed to be of zero mean.? We focus on the case where the transmitted 
signal suffers only from an additive disturbance. The key observation is that given 
any transmitter that transmits the SP (X(t), ¢ € R) and any receiver, we can 
design a new transmitter that transmits the waveform t + X(t) — c(t) and a 
new receiver with identical performance. Here c(-) is any deterministic signal. 
Indeed, the new receiver can simply add c(-) to the received signal and then pass 
on the result to the old receiver. That the old and the new systems have identical 
performance follows by noting that if (N (t), f€ R) is the added disturbance, then 
the received signal on which the old receiver operates is given by t'> X(t)+ N(t). 
And the received signal in the new system is t + X(t) — c(t) + N(t), so after we 
add c(-) to this signal we obtain the signal X(t) + N(t), which is equal the signal 
that the old receiver operated on. Thus, the performance of a system transmitting 
X(-) can be mimicked on a system transmitting X(-) — c(-) by simply adding c(-) 
at the receiver. See Figure 14.2. 


The addition at the receiver of c(-) entails no change in the transmitted power. 
Therefore, if a system transmits X(-), then we might be able to improve its power 
efficiency without hurting its performance by cleverly choosing c(-) so that the 
power in X(-) — c(-) be smaller than the power in X(-) and by then transmitting 
tr X(t) — c(t) instead of t + X(t). The only additional change we would need 
to make is to add c(-) at the receiver. 


How should we choose c(-)? To answer this we shall need the following lemma. 


3This, however, is not the case with some wireless systems that transmit training sequences 
to help the receiver learn the channel and acquire timing information. 
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Lemma 14.4.1. [f W is a random variable of finite variance, then 
E|(W —c)?] >Var[W], ceR (14.20) 


with equality if, and only if, 
c=E[W]. (14.21) 


= E[(W — E[W])?] + (E[W] - ¢)? 
> E[(W - E[W])?] 
= Var|W], 


with equality if, and only if, c= E[W]. 


With the aid of Lemma 14.4.1 we can now choose c(-) to minimize the power in 
tt> X(t) — c(t) as follows. Keeping the definition of power (14.14) in mind, we 
study 

ae E[(X(t) —e(t))”| at 
2T 


ay 

and note that this expression is minimized over all choices of the waveform c(-) by 
minimizing the integrand, i.e., by choosing at every epoch t the value of c(t) to be 
the one that mininimizes E|(X(2) - e(t))”] . By Lemma 14.4.1 this corresponds to 
choosing c(t) to be E[X(¢#)]. It is thus optimal to choose c(-) as 


c(t) =E[X()], teR. (14.22) 


This choice results in the transmitted waveform being t> X(t) — E[X(t)], i-e., in 
the transmitted waveform being of zero mean. 


Stated differently, if in a given system the transmitted waveform is not of zero 
mean, then a new system can be built that transmits a waveform of lower (or 
equal) average power and whose performance on any additive noise channel is 
identical. 


14.5 Computing the Power in PAM 


We proceed to compute the power in the signal 


X(t)=A 3 X,9(t—£1,), tER (14.23) 


L=—0o 
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under various assumptions on the bi-infinite random sequence (Xe, LE Z). We 
assume throughout that Conditions (14.16) & (14.17) are satisfied so the infinite 
sum converges at every epoch t € R. The power P is defined as in (14.14).4 


14.5.1 (X,) Is Zero-Mean and WSS 


Here we compute the power in the signal (14.23) when (Xe, LE Z) is a centered 
WSS SP of autocovariance function Kxyx: 


E[x)=0, eZ, (14.24a) 


E[(X¢Xeim|] = Kxx(m) ; £, m € Z. (14.24b) 
We further assume that the pulse shape satisfies the decay condition (14.17) and 
that the process (X;, £ € Z) satisfies the boundedness condition (14.16). 


We begin by calculating the expected energy of X(-) in a half-open interval [7, 7+Ts) 
of length T; and in showing that this expected energy does not depend on 7, i.e., 
that the expected energy in all intervals of length T; are identical. We calculate 
the energy in the interval [7,7 + T;) as follows: 


ff MPO) a 


=~? frel( 2 xa) | dt (14.25) 


L=—0o 


T+Ts co co 
= af e| SS S- XiXe t=) t= eT) dt 


L=—00 l/=—00 


T+Ts © iad 
- a f SSS ElXeXe] 9(t — eT.) 9(t - £'T,) dt 


L=—co t'=—00 


= A? i a 2 E[XeXe+m] 9(t — Ts) 9(t — (€ + m)T,) dt 


l=—oo M=—OCO 


Co 


T+Ts co 
ane / S> Kxx(m) > g(t — 1) 9(t — (C+ m)T,) dt 


m=—oo L=—0o 


oo oo r4T.—0Ts 
= S) Ket: SS i _ ghtl) a(t — mi) ae (14.26) 


m=—oo lL=—0o 


CO 


Sar Kxx(m) [ g(t’) 9(t’ — mT.) dt’ 


m=—oo os 


=A? S* Kxx(m)Reg(mTs), TER, (14.27) 


m>=—Cco 


4A general mathematical definition of the power of a stochastic process is given in Defini- 
tion 14.6.1 ahead. 
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where the first equality follows by the structure of X(-) (14.15); the second by 
writing X?(t) as X(t) X(t) and rearranging terms; the third by the linearity of 
the expectation, which allows us to swap the double sum and the expectation 
and to take the deterministic term g(t — ¢T;)g(t — @’T;) outside the expectation; 
the fourth by defining m & @ — @; the fifth by (14.24b); the sixth by defining 
t’ & t — €T,; the seventh by noting that the integrals of a function over all the 
intervals [7 — €T;,7 — €T; + Ts) sum to the integral over the entire real line; and the 
final by the definition of the self-similarity function Rgg (Section 11.2). 


Note that, indeed, the RHS of (14.27) does not depend on the epoch 7 at which 
the length-T; time interval starts. This observation will now help us to compute 
the power in X(-). Since the interval [—T,-+T) contains |(2T)/T;| disjoint intervals 
of the form [7,7 + Ts), and since it is contained in the union of [(2T)/T;] such 
intervals, it follows that 


B [[ X(t) a < ef x70 a < EB ff X(t) ay), (14.28) 


where we use |€| to denote the greatest integer smaller than or equal to € (e.g., 
|4.2| = 4), and where we use [€] to denote the smallest integer that is greater than 
or equal to € (e.g., [4.2] = 5) so 


God lé [eae eR: (14.29) 
Note that from (14.29) and the Sandwich Theorem it follows that 
. 1 {2t . 1 [2t 1 


Dividing (14.28) by 2T and using (14.30) we obtain that 


1 T ‘ 1 T+Ts ‘ 


which combines with (14.27) to yield 


1 co 
P= 7A S> Kxx(m) Rgg(mT.).- (14.31) 


m>=—Cco 


The power P can be alternatively expressed in the frequency domain using (14.31) 
and (14.4) as 


af)? af. (14.32) 


AZ ee) oo : 
P= =/ S- Kxx(m) een fmTs 


m=—Cco 


An important special case of (14.31) is when the symbols (X,¢) are zero-mean, 
uncorrelated, and of equal variance 0%. In this case Kxx(m) = 0% I{m = 0}, and 
the only nonzero term in (14.31) is the term corresponding to m = 0 so 


1 
P= ae llell3 oO, ((x0) centered, variance o%, uncorrelated ). (14.33) 
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14.5.2  Bi-Infinite Block-Mode 
The bi-infinite block-mode with a (K,N) binary-to-reals block encoder 
enc: {0,1}* — RN 


is depicted in Figure 14.1 and can be described as follows. A bi-infinite sequence 
of data bits (D;, JE Z) is fed to an encoder. The encoder parses this sequences 
into K-tuples and defines for every integer v € Z the “v-th data block” D, 


D, = (Dix4i,---,Dixix), eZ. (14.34) 


Each data block D, is then mapped by enc(-) to a real N-tuple, which we denote 
by X,: 
X,4enc(D,), v eZ. (14.35) 


The bi-infinite sequence (Xx 2, £€ Z) produced by the encoder is the concatenation 
of these N-tuples so 


(X~n41,---,Xvn4n) =X, VEZ. (14.36) 


Stated differently, for every v € Z and 7 € {1,...,N}, the symbol X,N4, is the 
n-th component of the N-tuple X,. The transmitted signal X(-) is as in (14.15) 
with the pulse shape g satisfying the decay condition (14.17) and with T, > 0 being 
arbitrary. (The boundedness condition (14.16) is always guaranteed in bi-infinite 
block encoding.) 


We next compute the power P in X(-) under the assumption that the data bits 
(D;, JE Z) are independent and identically distributed (IID) random bits, where 
we adopt the following definition. 


Definition 14.5.1 (IID Random Bits). We say that a collection of random variables 
are IID random. bits if the random variables are independent and each of them 
takes on the values 0 and 1 equiprobably. 


The assumption that the bi-infinite data sequence (Dj, JE Z) consists of IID 
random bits is equivalent to the assumption that the K-tuples (D,, ve Z) are 
IID with D, being uniformly distributed over the set of binary K-tuples {0,1}. 
We shall also assume that the real N-tuple enc(D) is of zero mean whenever the 
binary K-tuple is uniformly distributed over {0,1}*. We will show that, subject to 
these assumptions, 


1 


P= 
NTs 


E 


: A : X,9(t — £15) ‘ae (14.37) 
[L(x xen a9) ay 


maa =1 


This expression has an interesting interpretation. On the LHS is the power in 
the transmitted signal in bi-infinite block encoding using the (K, N) binary-to-reals 
block encoder enc(-). On the RHS is the quantity E/(NT;), where E, as in (14.3), is 
the expected energy in the signal that results when only the K-tuple (Di,..., Dx) 
is transmitted from time —oo to time +oo. Using the definition of the energy 
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per-symbol E, (14.7) we can also rewrite (14.37) as in (14.8). Thus, in bi-infinite 
block-mode, the transmitted power is the energy per real symbol Es normalized by 
the signaling period T,. Also, by (14.5), we can rewrite (14.37) as 


oo N N 


2 


TOO p=) bf=1 


a(f)|" af. (14.38) 


To derive (14.37) we first express the transmitted waveform X(-) as 


Dae eee.| S Xe g(t — £Tp) 


L=—0o 
N 


=A SO Yo Xning(t- (YN + n)Ts) 


v=—o n=1 


=A > u(X,,t-vNT,), teER, (14.39) 


Vv>=—oCo 


where the function u: RN x R — R is given by 


N 
u: (a1,...,0n,t) 4 S> a, g(t — nT). (14.40) 
n=1 


We now make three observations. The first is that because the law of D, does not 
depend on v, neither does the law of X, (= enc(D_)): 


X, =X), vv eZ. (14.41) 


The second is that the assumption that enc(D) is of zero mean whenever D is 
uniformly distributed over {0,1} implies by (14.40) that 


E[u(X.,t)]=0, (veZ, teR). (14.42) 


The third is that the hypothesis that the data bits (Dj, JE Z) are IID implies 
that (D,, VE Z) are IID and hence that Cos VE Z) are also IID. Consequently, 
since the independence of X, and X,, implies the independence of u(X,,t) and 
u(X_/t’), it follows from (14.42) that 


E[u(X,,t) u(X,,t’)] = 0, («. UER vA, yvVe Z). (14.43) 


Using (14.39) and these three observations we can now compute for any epoch tT € R 
the expected energy in the time interval [7,7 + NT;) as 


T+NTs 
E[X?(t)] de 


T+NTs 
Ps 


Co 


as 


Vv>=—0oCo 


2 
u(X,,t— »NT.)) | dt 
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T+NT, © sass 
= a | YS Flu(K,t- ont.) u(X,,¢— v'NT,)] at 
ae ieee: v'=—co 
=a | SS” E[u?(X.,t -—vNT.)] dt 
T+NTs si 
=A? | SS” E[u?(Xo,t — NT.) dt 
2 pr—(v—1)NT. 
=e E[u2(Xo,’)] ad! 
p=—oo 2 TTUNTs 


=A? | E[u2 (Xo, t’)] a! 


=: ie (AS eat = m)) ae , TER, (14.44) 


my é=1 


where the first equality follows from(14.39); the second by writing the square as 
a product and by using the linearity of expectation; the third from (14.43); the 
fourth because the law of X, does not depend on vy (14.41); the fifth by changing 
the integration variable to t’ = t — NT,; the sixth because the sum of the integrals 
is equal to the integral over R; and the seventh by (14.40). 


Note that, indeed, the RHS of (14.44) does not depend on the starting epoch 7 of 
the interval. Because there are |2T/(NT;)| disjoint length-NT; half-open intervals 
contained in the interval [—T,T) and because [2T/(NT;)] such intervals suffice to 
cover the interval [—T, 1), it follows that 


ian elf (Ad xeatt em) a 
< ef 0 a Be 


Fd , i (Ad wetter) a . 


l=1 


Dividing by 2T and then letting T tend to infinity establishes (14.37). 


14.5.3. Time Shifts of Pulse Shape Are Orthonormal 


We next consider the power in PAM when the time shifts of the real pulse shape by 
integer multiples of T,; are orthonormal. To remind the reader of this assumption, 
we change notation and denote the pulse shape by ¢(-) and express the orthonor- 
mality condition as 


ia o(t — 1,) o(t — Ts) dt =1l= 0}, 2,0 EZ. (14.45) 
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The calculation of the power is a bit tricky because (14.45) only guarantees that the 
time shifts of the pulse shape are orthogonal over the interval (—co, 00); they need 
not be orthogonal over the interval [—T,-+T] (even for very large T). Nevertheless, 
intuition suggests that if €T, and ¢’T; are both much smaller than T, then the 
orthogonality of t + g(t — Ts) and t  (t — &’Ts) over the interval (—co, oo) 
should imply that they are nearly orthogonal over [—T,T]. Making this intuition 
rigorous is a bit tricky and the calculation of the energy in the interval [—T,T] 
requires a fair number of approximations that must be justified. 


To control these approximations we shall assume a decay condition on the pulse 
shape that is identical to (14.17). Thus, we shall assume that there exist positive 
constants a and @ such that 


Bp 
|| = 1+ |t/T.|Pt@’ 


R. (14.46) 


(The pulse shapes used in practice, like those we encountered in (11.31), typically 
decay like 1/|t|? so this is not a serious restriction.) We shall also continue to assume 
the boundedness condition (14.16) but otherwise make no statistical assumptions 
on the symbols (Xv, £ € Z). 


The main result of this section is the next theorem. 


Theorem 14.5.2. Let the continuous-time SP (X(t), te R) be given by 


X(t))=A 3 X,d(t—f,), teR, (14.47) 


lL=—0o 


where A > 0; Ts > 0; the pulse shape ¢(-) is a Borel measurable function satisfying 
the orthogonality condition (14.45) and the decay condition (14.46); and where the 
random sequence (Xe, Le Z) satisfies the boundedness condition (14.16). Then 


ie / ; X?(t)dt = li : 3 E[X?] (14.48) 
1m —— = = LM — A 
T>00 2T [J_y Ts Loo 2+ 1 oe 1? 


whenever the limit on the RHS exists. 


Proof. The proof is somewhat technical and may be skipped. We begin by arguing 
that it suffices to prove the theorem for the case where T, = 1. To see this, assume 
that T; > 0 is not necessarily equal to 1. Define the function 


A(t) = Ts d(Tst), teER, (14.49) 


and note that, by changing the integration variable to 7 £ tT., 


Re b(t — 2) d(t — @) dt = i b(t — £15) o(7 — &'Ts) dr 


—oco 


==}, £f eZ, (14.50a) 
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where the second equality follows from the theorem’s assumption about the or- 
thogonality of the time shifts of @ by integer multiples of T;. Also, by (14.49) and 
(14.46) we obtain 


t)| = VTs |o(Tst)| 
Ji B 


= 1 + [t|i+@ 
ion 
=."__, teR 14.50b 
T+ fqere 5 ™ ( ) 


for some 3’ > 0 and a > 0. 
As to the power, by changing the integration variable to o £ t/T; we obtain 


hos Xe olte%)) a = Suan i 63 X¢ (oe )) a . (14.50¢) 


eZ T/Ts \gez 


It now follows from (14.50a) & (14.50b) that if we prove the theorem for the pulse 
shape @ with T, = 1, it will then follow that the power in >> Xe bo — £) is equal 
to limy_...(2L + 1)~' SS E[X?] and that consequently, by (14.50c), the power in 
> Xe o(t — CTs) is equal Ty* limp—..(2L + 1)~! 4 E[-X?]. In the remainder of the 
proof we shall thus assume that T, = 1 and express the decay condition (14.46) as 


s@1< —2 


———, teER 14.51 
Sapam te (14.51) 


for some 3,a > 0. 


To further simplify notation we shall assume that T is a positive integer. Indeed, 
if the limit is proved for positive integers, then the general result follows from the 
Sandwich Theorem by noting that for T> 0 (not necessarily an integer) 


Gal, (Exo )) ae 


LEZ 


a a (xa! )) dt (14.52) 


LEZ 


and by noting that both |T|/T and [T]/T tend to 1, as T— oo. 


We thus proceed to prove (14.48) for the case where T; = 1 and where the limit 
T — oo is only over positive integers. We also assume A = 1 because both sides of 
(14.48) scale like A®. We begin by introducing some notation. For every integer ¢ 
we denote the mapping t +> ¢(t — @) by @e, and for every positive integer T we 
denote the windowed mapping t+ ¢(t — @)I{|t| < T} by ge. Finally, we fix some 
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(large) integer vy > 0 and define for every T > v, the random processes 


Xo= So Xedbew: (14.53) 
\el<t—v 

X= YS Xedow, (14.54) 
TH-v<|e|<T+v 

X= YS) Xedow, (14.55) 
T+v<|e|<oo 


and the unwindowed version of Xo 


Xp= YS) Xede (14.56) 


so 


X(t) I{|t] < T} = Xo(t) + X1 (4) + Xa(t) 
= X$ + (Xo(t) — X(t) + X(t) + X2(t), te R. (14.57) 


Using arguments very similar to the ones leading to (4.14) (with integration re- 
placed by integration and expectation) one can show that (14.57) leads to the 
bound 


(Ve [xsi] - rE lo — Xa) +X, xe) 
ef al s 
(VE [ste] x VE I (Xo — Xj) + Xi + xalf]) (14.58) 


Note that, by the orthonormality assumption on the time shifts of ¢, 


usy2 
IXsle= So x? 


\e|<T—v 
sO 1 1 
. aie] 43 2 
im, 55 E [IC] = im ay DS EL]. ee?) 


[é)<t 


It follows from (14.58) and (14.59) that to conclude the proof of the theorem it 
suffices to show that for every fixed vy > 2 we have for T exceeding v 


tne ol 2] _ 
fim, 9p [P&I] =o, (14.60) 
iid uy2] 
Fim SE[|Xo — Xull3] = 0, (14.61) 
and that ; 
a Tee 2 
Jn, Jim [alt] <0 (ey 
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We begin with (14.60), which follows directly from the Triangle Inequality, 


Xile< SO [Xellbewlls 


T-v<|e|<T+v 
< 4V7, 
where the second inequality follows from the boundedness condition (14.16), from 


the fact that @e, is a windowed version of the unit-energy signal ¢¢ so ||@ew||, < 
||P||, = 1, and because there are 4v terms in the sum. 


We next prove (14.62). To that end we upper-bound |X2(¢)| for |t| < T as follows: 


IXoHl=| SJ) Xedt-H], lth <T 


T+v<|e|<0oo 
<7 YD [6t-9) 
T+v<|e|<oo 
B 
<7 pe lt — giro 
T+v<|e|<oo 


<y } —“~5 


T+v<|l|<oo ||2I ay al 


ee Sn ee 


T+v<|l|<oo (lé| — i aed 
ae 1 
= 278 
oe C0 
dL 
— 270 ps jita 
é=v+1 


<in6 [ “ elege 


BD aes 
= —p 


(14.63) 
a 


where the equality in the first line follows from the definition of X2 (14.55) by 
noting that for |t| < T we have ¢¢(t) = d¢w(t)); the inequality in the second line 
follows from the boundedness condition (14.16) and from the Triangle Inequality for 
Complex Numbers (2.12); the inequality in the third line from the decay condition 
(14.51); the inequality in the fourth line because |€ — ¢| > [I — I<|| whenever 
€,¢ € R; the inequality in the fifth line because we are only considering |t| < T and 
because over the range of this summation |¢| > T+ v; the equality in the sixth line 
from the symmetry of the summand; the equality in the seventh line by defining 
) & ( —T; the inequality in the eighth line from the monotonicity of the function 
€+> €-!-%, which implies that 


1 f- 
fite = = gl+a dé; 
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and where the final equality on the ninth line follows by computing the integral 
and by noting that for t that does not satisfy |t] < T the LHS |X2(t)| is zero, so 
the inequality is trivial. 


Using (14.63) and noting that X(t) is zero for |t| > T, we conclude that 
QB. 2 
Ree ae (14.64) 
a 
from which (14.62) follows. 


We next turn to proving (14.61). We begin by using the Triangle Inequality and 
the boundedness condition (14.16) to obtain 


2 
[Ko — X§llg = > Xe dew — SS Xe ge 
\e\<T-v \e|<T-v 2 
2 
= ye Xe(dew — be) 
\¢|<T-v Z 
2 
<7( YS Item delle) - (14.65) 
|e)<T-v 


We next proceed to upper-bound the RHS of (14.65) by first defining the function 


o(r) = i} : $2(t) dt (14.66) 


and by then using this function to upper-bound ||@¢ — @e,w||. as 


Pe — Pewlly < e(T—l4l), 14 <T, (14.67) 


because 


—T oo 
Ibe — ben = ore—oar+ f $(t— 0) at 


—T-2£ ee) 
=[  #ojas+ | 6%(s)as 


—oo T-£ 
—T+|é| co 
2 2 
ie Hisyast [| ga)as 


- / #(s)ds, | <T 
|s|>T-|é| 


= p(T |d)). 


It follows from (14.65) and (14.67) that 


2 
Xo — X¥I)2 < ral dies érlls) 


[é|<T-v 
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2 
<( } ott-1a)) 
\e\<T—» 
2 
sv(2 > ato) 
0<e<T—v 
T 2 
=4/ ( S- 7) (14.68) 
n=v 
We next note that the decay condition (14.51) implies that 
207 \1/2 _4 
< one 
p(r) < (s>55) r3-*, > 0, (14.69) 


because for every T > 0, 


P(r) = | Pow 


BP 
< —~ dt 
7 Ve eee 


= 26 | $2 PX dt 


2 2 
B qlee. 


and hence, by evaluating the integral explicitly, that 


. 
kao , tl 
Jim. = S> a(n) =0. (14.70) 
N=v 


From (14.68) and (14.70) we thus obtain (14.61). 


14.6 A More Formal Account 


In this section we present a more formal definition of power and justify some of 
the mathematical steps that we took in deriving the power in PAM signals. This 
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section is quite mathematical and is recommended for readers who have had some 
exposure to Measure Theory. 


Let R denote the o-algebra generated by the open sets in R. A continuous-time 
stochastic process (X(t)) defined over the probability space (0, F, P) is said to be 
a measurable stochastic process if the mapping (w,t) > X(w,t) from 2 x R 
to R is measurable when its range R is endowed with the o-algebra ® and when its 
domain 2 x R is endowed with the product o-algebra F x R. Thus, (X(t), t € R) 
is measurable if the mapping (w,t) > X(w,t) is FxR/R measurable.° 

From Fubini’s Theorem it follows that if (X(t), t € R) is measurable and if T > 0 
is deterministic, then: 


(i) For every w € 0, the mapping t+ X?(w,t) is Borel measurable; 
(ii) the mapping 
is 
vo | X?(w, t) dt 
-T 
is a random variable (i.e., F measurable) possibly taking on the value +00; 
(iii) and 
T T 
e| X?(t) a = i E[X?(t)] dt, TER. (14.71) 
aT -T 


Definition 14.6.1 (Power of a Stochastic Process). We say that a measurable 
stochastic process (X(t), t € R) is of power P if the limit 


se el es 
jm mel fx (at| (14.72) 


exists and is equal to P. 


Proposition 14.6.2. If the pulse shape g is a Borel measurable function satisfying 
the decay condition (14.17) for some positive a,3,T;, and if the discrete-time SP 
(Xe, LE Z) satisfies the boundedness condition (14.16) for some y > 0, then the 


stochastic process 
Co 


XK: (wt) 6A S$ Xp(w) 9(t - £15) (14.73) 


L=—0o 


is a measurable stochastic process. 


Proof. The mapping (w,t) +> X¢(w) is FxR/R measurable because X~ is a ran- 
dom variable, so the mapping w ++ X¢(w) is F/R measurable. The mapping 
(w,t) + Ag(t — CTs) is FxR/R measurable because g is Borel measurable, so 
tt> g(t — €Ts) is R/R measurable. Since the product of measurable functions is 
measurable (Rudin, 1974, Chapter 1, Section 1.9 (c)), it follows that the mapping 


5See (Billingsley, 1995, Section 37, p. 503) or (Loéve, 1963, Section 35) for the definition of a 
measurable stochastic process and see (Billingsley, 1995, Section 18) or (Loéve, 1963, Section 8.2) 
or (Halmos, 1950, Chapter VII) for the definition of the product o-algebra. 
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(w,t) H AX¢(w) 9(t — £Ts) is FxR/R measurable. And since the sum of measur- 
able functions is measurable (Rudin, 1974, Chapter 1, Section 1.9 (c)), it follows 
that for every positive integer L € Z, the mapping 


is FxR/R measurable. The proposition now follows by recalling that the pointwise 
limit of every pointwise convergent sequence of measurable functions is measurable 
(Rudin, 1974, Theorem 1.14). 


Having established that the PAM signal (14.73) is a measurable stochastic process 
we would next like to justify the calculations leading to (14.31). To justify the 
swapping of integration and summations in (14.26) we shall need the following 
lemma, which also explains why the sum in (14.27) converges. 


Lemma 14.6.3. If g(-) is a Borel measurable function satisfying the decay condition 


B 


t)| < 14.74 
Ol (14.74) 

for some positive a, Ts, and 3, then 
y: if |9(t) 9(t — mT,)| dt < ce. (14.75) 


M=—CO” 


Proof. The decay condition (14.74) guarantees that g is of finite energy. From the 
Cauchy-Schwarz Inequality it thus follows that the terms in (14.75) are all finite. 
Also, by symmetry, the term in (14.75) corresponding to m is the same as the one 
corresponding to —m. Consequently, to establish (14.75), it suffices to prove 


S- / |9(t) 9(t — mT) | dt < 00. (14.76) 
Ma=2 oo 
Define the function 
1 if |t] <1 
u(t) = —~ £éER. 
gu(t) ee otherwise, 


By (14.74) it follows that |g(t)| < 6 gu (t/T;) for all t € R. Consequently, 


[loot -mt]ars ef 


[oe) 


gu(t/Ts) Iu(t/Ts — m) dt 


= et, | Gu(T) Iu(t — m) dr, 


—Co 


and to establish (14.76) it thus suffices to prove 


Ss ae gu(T) Ju(T — m) dr < ov. (14.77) 


m=2~ — 
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Since the integrand in (14.77) is symmetric around 7 = m/2, it follows that 


l- GulT) Galt —m) dr = 2 fo Gu(T) Gu(t — m) dr, (14.78) 


m/2 


and it thus suffices to establish 


» - Gu(T) u(t — m) dr < 00. (14.79) 


ma—27m/2 


We next upper-bound the integral in (14.79) for every m > 2 by first expressing it 
as 


: GulT) Sut — Mm) drt = +1n4+ Is, 
m/2 


where 


m-1 
1 1 
r,4 d 
: i. zie (m— Tye 


m+1 1 
Ip 4 / Ta dr, 
T 


m1 


Po «cal 1 
I, = dr. 
° ie, rite (pm) 


m 


We next upper-bound each of these terms for m > 2. Starting with J, we obtain 
upon defining € £m — Tr 


m-1 
a= |, eee 
m/2 
=f rem ae 
red A 1 1 
| Gn fayira Gira“ 
= eae (1-=), m> 2, 


which is summable over m. As to Ig we have 


mt+1 1 
b= f are drt 


2 2 
= imate eo 


es 1 1 
=| (-a) T* glte dé 
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ue 1 1 se 1 1 
= dé + / dé 
i] (é + m)l+e glta es (€ a m)'t* glta 
1 we #4, gt XL 1 
= mite | Elta dg + i glta élt+a dg 
1 1 1 1 1 
= 1 t ) > 2, 
a mite ( =) 1+ 2a m!+2e se 


which is summable over m. 


We can now state (14.31) as a theorem. 


Theorem 14.6.4. Let the pulse shape g: R > R be a Borel measurable function sat- 
isfying the decay condition (14.17) for some positive a, 3, and Ts. Let (Xe, LE Z) 
be a centered WSS SP of autocovariance function Kxx and satisfying the bound- 
edness condition (14.16) for some y > 0. Then the stochastic process (14.73) is 
measurable and is of the power P given in (14.31). 


Proof. The measurability of (X(t), t € R) follows from Proposition 14.6.2. The 
power can be derived as in the derivation of (14.31) from (14.27) with the derivation 
of (14.27) now being justifiable by noting that (14.25) follows from (14.71) and by 
noting that (14.26) follows from Lemma 14.6.3 and Fubini’s Theorem. 


Similarly, we can state (14.37) as a theorem. 


Theorem 14.6.5 (Power in Bi-Infinite Block-Mode PAM). Let (Dj, j € Z) be 
IID random bits. Let the (K,N) binary-to-reals encoder enc: {0,1}« — RN be 
such that enc(D,,...,D«) is of zero mean whenever the K-tuple (Di,...,Dx) is 
uniformly distributed over {0,1}. Let (Xe, Le Z) be generated from (D;, JE Z) 
in bi-infinite block encoding mode using enc(-). Assume that the pulse shape g is a 
Borel measurable function satisfying the decay condition (14.17) for some positive 
a, B, and Ts. Then the stochastic process (14.73) is measurable and is of the 
power P as given in (14.37). 


Proof. Measurability follows from Proposition 14.6.2. The derivation of (14.37) is 
justified using Fubini’s Theorem. 


14.7. Exercises 


Exercise 14.1 (Superimposing Independent Transmissions). Let the two PAM signals 
(x (2) and (x) (2) be given at every epoch t € R by 


SOOO HAC) Se RG Gi KO SAL SO gM Eth), 
L=—o0o L=—00 
where the zero-mean real symbols (Xx ) are generated from the data bits (D\”) and 
the zero-mean real symbols (x) from (De): Assume that the bit streams (D\”) and 
(pe) are independent and that (x (z)) and (x (4) are of powers P and P®), 
Find the power in the sum of (x (2) and (Xx (2). 
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Exercise 14.2 (The Minimum Distance of a Constellation and Power). Consider the 
PAM signal (14.47) where the time shifts of the pulse shape @ by integer multiples of Ts 
are orthonormal, and where the symbols (Xx. e) are IID and uniformly distributed over the 
set {+4,+24,...,4(2v —1)$}. Relate the power in X(-) to the minimum distance d and 
the constant A. 


Exercise 14.3 (PAM with Nonorthogonal Pulses). Let the IID random bits (D;, j € Z) 
be modulated using PAM with the pulse shape g: t +> I{|t| < Ts} and the repetition 
block encoding map 0+ (+1,+1) and 1+ (—1,-—1). Compute the average transmitted 
power. 


Exercise 14.4 (Non-IID Data Bits). Expression (14.37) for the power in bi-infinite block 
mode was derived under the assumption that the data bits are IID. Show that it need 
not otherwise hold. 


Exercise 14.5 (The Power in Nonorthogonal PAM). Consider the PAM signal (14.23) 
with the pulse shape g: t +> I{|t| < Ts}. 


(i) Compute the power in X(-) when (X¢) are IID of zero-mean and unit-variance. 


(ii) Repeat when (Xe) is a zero-mean WSS SP of autocovariance function 


1 m=0 
Kxx(m) = $ jmj=1 , meZ. 
0 otherwise 


Note that in both parts ELX;] = 0 and E[X7] = 1. 


Exercise 14.6 (Pre-Encoding). Rather than applying the mapping enc: {0,1}* — RN 
to the IID random bits D,,..., D« directly, we first map the data bits using a one-to-one 
mapping @: {0,1}* > {0,1}* to Dj,...,D, and we then map D{,...,D using enc 
to X1,..., Xn. Does this change the transmitted energy? 


Exercise 14.7 (Binary Linear Encoders Producing Pairwise-Independent Symbols). Bi- 
nary linear encoders with the antipodal mapping can be described as follows. Using a de- 
terministic binary K x N matrix G, the encoder first maps the row-vector d = (di,...,dk) 
to the row-vector dG, where dG is computed using matrix multiplication over the binary 
field. (Recall that in the binary field multiplication is defined as 0-0 =0-1=1-0=0, 
and 1-1 = 1; and addition is modulo 2, so0®@0=161=0and001=160=1). 
Thus, the th component ce of dG is given by 


co= di: gh” @do- g? @::-@dx igh 


The real symbol xe is then computed according to the rule 


i ites 
ped ROS iets a Ne 
2h “Ahaeasd: 


Let Xi, X2,..., XN be the symbols produced by the encoder when it is fed IID random 
bits D,, Do,..., Dx. Show that: 


(i) Unless all the entries in the @th column of G are zero, E[X¢] = 0. 
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(ii) X¢ is independent of X,y if, and only if, the ¢-th column and the ¢’-th column of G 
are not identical. 


You may find it useful to first prove the following. 


(i) Ifa RV E takes value in the set {0,1}, and if F takes on the values 0 and 1 equiprob- 
ably and independently of FE, then E'@ F is uniform on {0,1} and independent of EF. 


(ii) If £, and E2 take value in {0,1}, and if F' takes on the values 0 and 1 equiprobably 
and independently of (£1, 2), then E; ® F is independent of E2. 


Exercise 14.8 (Zero-Mean Signals for Linearly Dispersive Channels). Suppose that the 
transmitted signal X suffers not only from an additive random disturbance but also 
from a deterministic linear distortion. Thus, the received signal Y can be expressed as 
Y = Xxh+N, where h is a known (deterministic) impulse response, and where N is 
an unknown (random) additive disturbance. Show heuristically that transmitting signals 
of nonzero mean is power inefficient. How would you mimic the performance of a system 
transmitting X(-) using a system transmitting X(-) — c(-)? 


Exercise 14.9 (The Power in Orthogonal Code-Division Multi-Accessing). Suppose that 
the data bits (D\?) are mapped to the real symbols (x) and that the data bits (D\’) 


are mapped to (OE?) Assume that 
(A) 2 1 L 


())2] _ p(t) 
T. ee )*]=P 


and similarly for Pp). Further assume that the time shifts of ¢ by integer multiples of T; 
are orthonormal and that @ satisfies the decay condition (14.46). Finally assume that 


(xe) and (ee) are bounded in the sense of (14.16). Compute the power in the signal 


S~ ((APx2 + AXP) d(t — 2eT.) + (APM — AXP) (t- (204 T.)) 


L=—00 


Exercise 14.10 (More on Orthogonal Code-Division Multi-Accessing). Extend the result 
of Exercise 14.9 to the case with 7 data streams, where the transmitted signal is given by 


S- (Ga cee al DAMX” a(t 2 neTs) 


L=—0co 


A Ghee ob (aor Anxy? eign ale al™™ AMX)” alt —(nl+n- 1)T.)) 


and where the real numbers a‘ for 1, € {1,..., 7} satisfy the orthogonality condition 


3 (uu) 02) rf NU ee = ah {1 } 
av’va’ Tt = . ; lt Gite og 
ot 0 ifeAuv, 


The sequence ale) ied a") is sometimes called the signature of the i-th stream. 


Exercise 14.11 (The Samples of the Self-Similarity Function). Let g: R — R be of finite 
energy, and let Rgg be its self-similarity function. 
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(i) Show that there exists an integrable nonnegative function G: [—1/2, 1/2) — [0, oo) 
such that 
1/2 : 
Rex(mT.) = | (be dB, mez, 
-1/2 


and such that G(—@) = G(6) for all |6| < 1/2. Express G(-) in terms of the FT of g. 
(ii) Show that if the samples of the self-similarity function are absolutely summable, 
ie., if 
S~ [Reg (mTs)| < O, 


mez 


then the function 
Or+ S > Reg(mTs)e?"""", 6 € [-1/2, 1/2), 


is such a function, and it is continuous. 


(iii) Show that if (Xv) is of PSD Sxx, then the RHS of (14.31) can be expressed as 


Exercise 14.12 (A Bound on the Power in PAM). Let G(-) be as in Exercise 14.11. 


(i) Show that if (Xe) is of zero mean, of unit variance, and has a PSD, then the RHS 
of (14.31) is upper-bounded by 


=A? sup G(6). (14.80) 
Ts -1/2<6<1/2 


(ii) Suppose now that G(-) is continuous. Show that for every « > 0, there exists a zero- 
mean unit-variance SP (Xv) with a PSD for which the RHS of (14.31) is within € 
of (14.80). 


Chapter 15 


Operational Power Spectral Density 


15.1 Introduction 


The Power Spectral Density of a stochastic process tells us more about the SP than 
just its power. It tells us something about how this power is distributed among 
the different frequencies that the SP occupies. The purpose of this chapter is to 
clarify this statement and to derive the PSD of PAM signals. Most of this chapter 
is written informally with an emphasis on ideas and intuition as opposed to math- 
ematical rigor. The mathematically-inclined readers will find precise statements 
of the key results of this chapter in Section 15.5. We emphasize that this chapter 
only deals with real continuous-time stochastic processes. 


The classical definition of the PSD of continuous-time stochastic processes (Defini- 
tion 25.7.2 ahead) is only applicable to wide-sense stationary stochastic processes, 
and PAM signals are not WSS.! Consequently, we shall have to introduce a new 
concept, which we call the operational power spectral density, or the op- 
erational PSD for short.? This new concept is applicable to a large family of 
stochastic processes that includes most WSS processes and most PAM signals. 
For WSS stochastic processes, the operational PSD and the classical PSD coin- 
cide (Section 25.14). In addition to being more general, the operational PSD is 
more intuitive in that it clarifies the origin of the words “power spectral density.” 
Moreover, it gives an operational meaning to the concept. 


15.2 Motivation 


To motivate the new definition we shall first briefly discuss other “densities” such 
as charge density, mass density, and probability density. 


In electromagnetism one encounters the concept of charge density, which is often 
denoted by o(-). It measures the amount of charge per unit volume. Since the 


lif the discrete-time symbol sequence is stationary then the PAM signal is cyclostationary. 
But this term will not be used in this book. 

?These terms are not standard. Most of the literature does not seem to distinguish between 
the PSD in the sense of Definition 25.7.2 and what we call the operational PSD. 
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function quantity of interest | per unit of 
charge (spatial) density charge space 
mass (spatial) density mass space 
mass line density mass length 
probability (per unit of X) density probability unit of X 
power spectral density power spectrum (Hz) 


Table 15.1: Various densities and their units 


charge need not be uniformly distributed, o(-) is typically not constant so the charge 
density is a function of location. Thus, we usually write o(x,y,z) for the charge 
density at the location (x,y,z). This can be defined differentially or integrally. 
The differential definition is 


o(x, y, Z) 


Charge in Box {(a’,y’, 2’): |x—2'| < 4 ily y'|< 4 |z z' | 


= lim x nN 
Alo Volume of Box {(2’,y’, 2’) : |e —2’| < 4,ly—-y'| < F|z-2"| 
/ 


— 2°? 


ihe Charge in box {(2’,y’, 2’): |x—2'| < 4 ly y|< 4, |z— 2’ | 
Alo A3 , 


and the integral definition is that a function o(-) is the charge density if for every 
region D Cc R® 


Charge in D = o(z,y,z)dxdydz, DCR’. 
(x,y,z)ED 
Ignoring some mathematical subtleties, the two definitions are equivalent. Perhaps 
a more appropriate name for charge density is “Charge Spatial Density,” which 
makes it clear that the quantity of interest is charge and that we are interested in 
the way it is distributed in space. The units of o(a,y, z) are those of charge per 
unit volume. 


Mass density—or as we would prefer to call it, “Mass Spatial Density” —is analo- 
gously defined. Either differentially, as 


oxy, 2) 
Mass in Box {(2’,y’, 2’) : |e —2"| < 4, ly-y'|< 4, jz—2’|< 4} 


im 

Alo Volume of Box {(2’, y’, 2’) : |x v| < 4,ly y'| < 4,\z z|< 4} 
Mass in box {(2’,y’, 2’) : |x w'| <4, ly y'| <4, |z z'|< $} 

m 


A\0 A3 : 


or integrally as the function o(x,y, z) such that for every subset D C R? 
Mass in D = o(z,y,z)dxdydz, DCR’. 
(x,y,z)ED 


The units are those of mass per unit volume. Since mass is nonnegative, the 
differential definition of mass density makes it clear that mass density must also 
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be nonnegative. This is slightly less apparent from the integral definition, but 
(excluding subsets of R*® of measure zero) is true nonetheless. By convention, if 
one defines mass density integrally, then one typically insists that the density be 
nonnegative. 


Similarly, in discussing mass line density one envisions a one-dimensional object, 
and its density with respect to unit length is defined differentially as 


Mass in Interval {2’ : |” — 2’| < 4 
o(x) = lim 12 ¢| Isa} 
A\O A 


or integrally as the nonnegative function o(-) such that for every subset D C R of 
the real line 


Mass in D = / o(a)dz, DCR. 
x2ED 


The units are units of mass per unit length. 


In probability theory one encounters the probability density function of a random 
variable X. Here the quantity of interest is probability, and we are interested in 
how it is distributed on the real line. The units depend on the units of X. Thus, if 
X measures the time in days until at least one piece in your new china set breaks, 
then the units of the probability density function fx (-) of X are those of probability 
(unit-less) per day. The probability density function can be defined differentially 
as 


or integrally by requiring that for every subset € C R 
Pr[X € €] = / fx(a)da, ECR. (15.1) 
£EE 


Again, since probabilities are nonnegative, the differential definition makes it clear 
that the probability density function is nonnegative. In the integral definition we 
typically add the nonnegativity as a condition. That is, we say that fx(-) is a 
density function for the random variable X if fx(-) is nonnegative and if (15.1) 
holds. (There is a technical uniqueness issue that we are sweeping under the rug 
here: if fx(-) is a probability density function for X and if €(-) is a nonnegative 
function that differs from fx(-) only on a set of Lebesgue measure zero, then €(-) 
is also a probability density function for X.) 


With these examples in mind, it is natural to interpret the power spectral density 
of a stochastic process (X(t), t € R) as the distribution of the power of X(-) 
among the different frequencies. See Table 15.1 on Page 246. Heuristically, we 
would define the power spectral density Sxx at the frequency f differentially as 


Power in the frequencies | f — 4, ft+ 4] 


Sxx(f) = lee a 


or integrally by requiring that for any subset D of the spectrum 


Power of X in D = Sxx(f)df, DCR. (15.2) 
fED 
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To make this meaningful we next explain what we mean by “the power of X in 
the frequencies D.” To that end it is best to envision a filter of impulse response h 
whose frequency response h is given by 


: f if f ED, 


h(f) = 15.3 
(f) 0 otherwise, ( ) 


and to think of the power of X(-) in the frequencies D as the average power at the 
output of that filter when it is fed X(-), ie., the average power of the stochastic 
process X * h.? 

We are now almost ready to give a heuristic definition of the power spectral density. 
But there are three more points we would like to discuss first. The first is that 
(15.2) can also be rewritten as 


Power of X in D = lf €D}Sxx(f)df, DCR. (15.4) 

all frequencies 
It turns out that if (15.2) holds for all sets D C R of frequencies, then it also holds 
for all “nice” filters (of a frequency response that is not necessarily {0,1} valued): 


Power of X *h = / JAA)? Sxx(f)af, “nice.” (15.5) 
all frequencies 

That (15.4) typically implies (15.5) can be heuristically argued as follows. By 
(15.4) the set of frequency responses h for which (15.5) holds includes all frequency 
responses of the form h(f) = I{f € D}. But if (15.5) holds for some frequency 
response h, then it must also hold for ah, where a is any complex number, because 
scaling the frequency response by a merely multiplies the output power by |a|?. 
Also, if (15.5) holds for two responses h; and hy for which 


hi(f) ho(f)=0, feR, (15.6) 


then it must also hold for h; + hy, because Parseval’s Theorem and (15.6) imply 
that X *h; and X «hy must be orthogonal. Thus, (15.6) implies that the power 
in X « (hy + hg) is the sum of the power in X * h; and the power in X * hg. It 
thus intuitively follows that if (15.4) holds for all subsets D of the spectrum, then 
it holds for all step functions h(f) = “av l{f € Dv}, where {D,} are disjoint. 
And since any “nice” frequency response h can be arbitrarily well approximated 
by such step functions, we expect that (15.5) would hold for all “nice” responses. 


Having heuristically established that (15.2) implies (15.5), we prefer to define the 
PSD as a function Sxx for which (15.5) holds, where “nice” will be taken to mean 
stable. 


The second point we would like to make is regarding uniqueness. For real stochastic 
processes it is reasonable to require that (15.5) hold only for filters of real impulse 
response. Thus we would require 


Power of X*h= / |A(f)|? Sxx(f) df, real and “nice.” (15.7a) 


all frequencies 


3We are ignoring the fact that the RHS of (15.3) is typically not the frequency response of a 
stable filter. A stable filter has a continuous frequency response (Theorem 6.2.11 (i)). 
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But since for filters of real impulse response the mapping f |h( f)|? is symmetric, 
(15.7a) can be rewritten as 


| ‘. Ja(f)? (Sxx(f) + Sxx(—f)) df, hh real and “nice.” (15.7b) 


This form makes it clear that for real stochastic processes, (15.7a) (or its equivalent 
form (15.7b)) can only specify the function f + Sxx(f)+Sxx(—f); it cannot fully 
specify the mapping f +> Syx(f). For example, if a symmetric function Sxx 
satisfies (15.7a), then so does 


fies oe if f > 0, Bae 


0 otherwise, 


In fact, if Sxx satisfies (15.7a), then so does any function S(-) such that 
S(f) + S(—f) = Sxx(f) +Sxx(-f), feR. 


Thus, for the sake of uniqueness, we define the power spectral density Sxx to be 
a function of frequency that satisfies (15.7a) and that is additionally symmetric. 
It can be shown that this defines Syx (to within indistinguishability) uniquely. 
In fact, once one has identified a nonnegative function S(-) such that for any real 
impulse response h the integral 


fA ” sca) ACP af 


corresponds to the power in X *h, then the PSD Sxx of X is given by the sym- 
metrized version of S(-), ie., 


Sex(f)= 5 (SN +S(-f)), FER. (15.8) 


Note that the differential definition of the PSD would not have resolved the unique- 
ness issue because a filter of frequency response f | I{ f [ fo 4, fo 4 4\} is 
not real. 


The final point we would like to make is regarding additivity. Apart from some 
mathematical details, what makes the definition of charge density possible is the 
fact that the total charge in the union of two disjoint regions in space is the sum 
of charges in the individual regions. The same holds for mass. For the probability 
densities the crucial property is that the probability of the union of two disjoint 
events is the sum of the probabilities. Consequently, if D,; and D2 are disjoint 
subsets of R, then Pr[X € D,; UD2] = Pr[X € Di] + Pr[X © D2]. Does this 
hold for power? In general the power in the sum of two signals is not the sum of 
the individual powers. But if the signals are orthogonal, then their powers do add. 
Thus, while Parseval’s theorem will not appear explicitly in our analysis of the PSD, 
it is really what makes it all possible. It demonstrates that if D,,D2 C R are disjoint 
frequency bands, then the signals X * h; and X « hy that result when X is passed 
through the filters of frequency response hi(f) = I{f € Di} and ho(f) =H f € Do} 
are orthogonal, so their powers add. We will not bother to formulate this result 
precisely, because it does not show up in our analysis explicitly, but it is this result 
that allows us to define the power spectral density. 
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15.3 Defining the Operational PSD 


Recall that in (14.14) we defined the power P in a SP (Y(t), t€ R) as 


[vo a 


whenever the limit exists. Thus, the power is the limit, as T tends to infinity, of 
the ratio of the expected energy in the interval [—T, T] to the interval’s duration 2T. 
We define the operational power spectral density of a stochastic process as follows. 


Definition 15.3.1 (Operational PSD of a Real SP). We say that the continuous- 
time real stochastic process (X (2), te R) is of operational power spectral 
density Sxx if (X(2), te R) is a measurable SP; the mapping Sxx: R — R is 
integrable and symmetric; and for every stable real filter of impulse response h € Ly 
the average power at the filter’s output when it is fed (X(t), te R) is given by 


1 
P= lim —E 
To 2T 


Power in X +h = Ie Sxx(f) |A(f)|? df. 


We chose our words very carefully in the above definition, and, in doing so, we 
avoided two issues. The first is whether every SP is of some operational PSD. 
The answer to that is “no.” (But most stochastic processes encountered in Digital 
Communications are.) The second issue we avoided is the uniqueness issue. Our 
wording did not indicate whether a SP could be of two different operational PSDs. 
It turns out that if a SP is of two different operational PSDs, then the two are 
equivalent in the sense that they agree except possibly on a set of frequencies of 
Lebesgue measure zero. Consequently, somewhat loosely, we shall speak of the 
operational power spectral density of (Xx (t), t€ R) even though the uniqueness is 
only to within indistinguishability. The uniqueness is a corollary to the following 
somewhat technical lemma. 


Lemma 15.3.2. 


(i) If s is an integrable function such that 


[ stninnrar =o (15.9) 
for every integrable complex function h: R > C, then s(f) is zero for all 
frequencies outside a set of Lebesgue measure zero. 


(ii) Ifs is a symmetric function such that (15.9) holds for every integrable real 
function h: R > R, then s(f) is zero for all frequencies outside a set of 
Lebesgue measure zero. 


Proof. We begin with a proof of Part (i). For any A > 0 and fo € R define the 
function h: R — C by 


1 
nt) = {Wl < phere, ter. (15.10) 
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This function is in both £; and Lg. Since it is in Log, its self-similarity func- 
tion Rnn(7) is defined at every 7 € R. In fact, 


Rnn (7) = (1 = a) I{|7] < Ape?"™7, r ER. (15.11) 


And since h € £,, it follows from (11.35) that the Fourier Transform of Rph 
is the mapping f + |A(f)|?. Consequently, by Proposition 6.2.3 (i) (with the 
substitution Ran for g), the mapping f + |h(f)|? 
Fourier Transform of Rpn- Thus, by (6.9) (with the substitutions of s for x and Rak 
for g), 


can be expressed as the Inverse 


f swinpar= fo sy Ranar. (15.12) 
It now follows from (15.9), (15.12), and (15.11) that 
[ (.- 4) a(fye?"Fof df =0, A>O0, foER (15.13) 
oe X s =U, > JO . : 


Part (i) now follows from (15.13) and from Theorem 6.2.12 (ii) (with the substitu- 
tion of s for x and with the substitution of fo for t). 


We next turn to Part (ii). For any integrable complex function h: R — C, define 
hr = Re(h) and hy + Im(h) so 


° h(f) + h*(—f) 


in(f) = MOF EECD fer, 
iy = MO= MED pep 
Consequently, 
lan? = Z(JAD[? + [AAP + 2Re(MNAM—A)), FER 
Jacl? = T (IAC? + [aA P -2Re(ACA AA), FER, 
and 
lan(NE + lan =S (lA? +|a-HP), Fer. (15.14) 


Applying the lemma’s hypothesis to the real functions hry and hy we obtain 


o= f * 6(F)|hn(f) |? af, 


—co 


o= fo s(f)lAr(f)|° af, 


—oco 
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and thus, upon adding the equations, 


0 -[ (Jae(f)|? + lra(s)|") af 
=f 90) (ln? + l-ny?) ar 


=f “2eecp 


5 IAA) as 


ae s(f) |h(A)? af, (15.15) 


—oCo 


where the second equality follows from (15.14); the third by writing the integral 
of the sum as a sum of integrals and by changing the integration variable in the 
integral involving h(—f); and the last equality from the hypothesis that s is sym- 
metric. Since we have established (15.15) for every complex h: R — C, we can now 
apply Part (i) to conclude that s is zero at all frequencies outside a set of Lebesgue 
measure zero. 


Corollary 15.3.3 (Uniqueness of PSD). Jf both Sxx and Six(-) are operational 
PSDs for the real SP (X(t), te R), then the set of frequencies at which they differ 
is of Lebesgue measure zero. 


Proof. Apply Lemma 15.3.2 (ii) to the function s: f > Sxx(f) — Sky (f). 


As noted above, we make here no general claims about the existence of opera- 
tional PSDs. Under certain restrictions that are made precise in Section 15.5, the 
operational PSD is defined for PAM signals. And by Theorem 25.13.2, the oper- 
ational PSD always exists for measurable, centered, WSS, stochastic processes of 
integrable autocovariance functions. 


Definition 15.3.4 (Bandlimited Stochastic Processes). We say that a stochastic 
process (X(t), te R) of operational PSD Sxx is bandlimited to W Hz if, except 
ona set of frequencies of Lebesgue measure zero, Sxx(f) is zero for all frequencies f 
satisfying |f| > W. 


The smallest W to which (X(t), t € R) is limited is called the bandwidth of 
(X(t), ER). 


15.4 The Operational PSD of Real PAM Signals 


Computing the operational PSD of PAM signals is much easier than you might 
expect. This is because, as we next show, passing a PAM signal of pulse shape g 
through a stable filter of impulse response h is tantamount to changing its pulse 
shape from g to g *h: 


((o- AD Xiao ~ 1) “n) (t)=AS_ Xi(g*h)(t-f,), t€R. (15.16) 
€ 


L 
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(For a formal statement of this result, see Corollary 18.6.2, which also addresses the 
difficulty that arises when the sum is infinite.) Consequently, if one can compute 
the power in a PAM signal of arbitrary pulse shape (as explained in Chapter 14), 
then one can also compute the power in a filtered PAM signal. 


That filtering a PAM signal is tantamount to convolving its pulse shape with the 
impulse response follows from two properties of the convolution: that it is linear 


(au+ 6v)*h=auxh+ Gvx«h 


and that convolving a delayed version of a signal with h is equivalent to convolving 
the original signal and delaying the result 


((o 1+ u(o — to)) +h) (t) =(uxh)(t—to), tt ER. 


Indeed, if X is the PAM signal 


X(t) =A S° Xeg(t - £15), (15.17) 
then (15.16) follows from the calculation 


(X xh) (t) = ((0A 3 Xi alo ~ €1,)) +h) (0 


L=—0o 


=A oS xf h(s) g(t — s — €T;) ds 


L=—0o 
=A S> Xi(gxh)(t- 4), teR. (15.18) 
L=—0o 


We are now ready to apply the results of Chapter 14 on the power in PAM signals 
to study the power in filtered PAM signals and hence to derive the operational 
PSD of PAM signals. We will not treat the case discussed in Section 14.5.3 where 
the only assumption is that the time shifts of the pulse shape by integer multiples 
of T; are orthonormal, because this orthonomality is typically lost under filtering. 


15.4.1 (Xz, LE Z) Are Centered, Uncorrelated, and of Equal Variance 


We begin with the case where the symbols (Xx 2, €E Z) are of zero mean, uncor- 
related, and of equal variance 0%. As in (15.17) we denote the PAM signal by 
(X(t), t € R) and study its operational PSD by studying the power in X xh. 
Using (15.18) we obtain that X «xh is the PAM signal X but with the pulse shape g 
replaced by g xh. Consequently, using Expression (14.33) for the power in PAM 


with zero-mean, uncorrelated, variance-o% symbols, we obtain that the power in 
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X xh is given by 


AZ 
Power in Xxh= Ox lg * hll5 
Mee fe » 
= 328 far AP ar 


=[- (AP ane)mra, sag 


—co 


Sxx (f) 


where the first equality follows from (14.33) applied to the PAM signal of pulse 
shape gxh; the second follows from Parseval’s Theorem by noting that the Fourier 
Transform of a convolution of two signals is the product of their Fourier Transforms; 
and where the third equality follows by rearranging terms. From (15.19) and from 
the fact that f + |9(f)|? is a symmetric function (because g is real), it follows 
that the operational PSD of the PAM signal (X (2), te R) when (Xe, LE Z) are 
zero-mean, uncorrelated, and of variance o% is given by 


A? oe 
Ts 


Sxx (f) = la? feR. (15.20) 


15.4.2 (X;) Is Centered and WSS 


The more general case where the symbols (Xe, Le Z) are not necessarily un- 
correlated but form a centered, WSS, discrete-time SP can be treated with the 
same ease via (14.31) or (14.32). As above, passing X through a filter of impulse 
response h results in a PAM signal with identical symbols but with pulse shape 
gxh. Consequently, the resulting power can be computed by substituting g*h 
for g in (14.32) to obtain that the power in X *h is given by 


Co 


cate cf 
Power in X xh = des (+ S- Kxx(m) ein fmTs 


m>=—Cco 


aD NP aL 


Sxx (f) 


where again we are using the fact that the FT of g*his f + 9(f)h(f). The 
operational PSD is thus 


2 °° 
Sxx(f) = S> Kxx(m) e?™/™™ 9(f))?, fF ER, (15.21) 


m>=—Cco 


because, as we next argue, the RHS of the above is a symmetric function of f. 
This symmetry follows from the symmetry of |g(-)| (because the pulse shape g 
is real) and from the symmetry of the autocovariance function Kxx (because the 
symbols (X¢, ¢ € Z) are real; see (13.12)). Note that (15.21) reduces to (15.20) if 
Kxx(m) = 0% I{m = 0}. 
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15.4.3. The Operational PSD in Bi-Infinite Block-Mode 


We now assume, as in Section 14.5.2, that the (K, N) binary-to-reals block encoder 
enc: {0,1} — RN is used in bi-infinite block encoding mode to map the bi- 
infinite IID random bits (D;, JE Z) to the bi-infinite sequence of real numbers 
(Xe, LE Z), and that the transmitted signal is 


X(t)=A se Xpg(t = £1), (15.22) 


L=—0o 


where T, > 0 is the baud, and where g(-) is a pulse shape satisfying the decay 
condition (14.17). We do not assume that the time-shifts of g(-) by integer multiples 
of T; are orthogonal, or that the symbols (Xe, LE Z) are uncorrelated. We do, 
however, continue to assume that the N-tuple enc(Dj,...,Dx) is of zero mean 
whenever D,,...,D« are IID random bits. 


We shall determine the operational PSD of X by computing the power of the signal 
that results when X is fed to a stable filter of impulse response h. As before, we note 
that feeding X through a filter of impulse response h is tantamount to replacing 
its pulse shape g by g xh. The power of this output signal can be thus computed 
from our expression for the power in bi-infinite block encoding with PAM signaling 
(14.38) but with the pulse shape being g*h and hence of FT f + g(f) A(f): 


oe) N N 
Power in Xeh= / Lea aES AE i2n f (€—£) Ts 
_oo \ NTs 


Sxx (f) 


uf) 


*) IAAP a 


As we next show, the underbraced term is a symmetric function of f, and we thus 
conclude that the PSD of X is: 


N N 


A? 
Sxx(f) = NT. ) ) E[X¢Xy] e276 7 
S g=10/=1 


afr, FER. (15.23) 


To see that the RHS of (15.23) is a symmetric function of f, use the identities 


N ¢-1 
5 ye = aoe + s ( (ae,e + Gere 
é=1 #=1 f=10/=1 


and E[X¢Xv] = E[Xyv X¢] to rewrite the RHS of (15.23) in the symmetric form 


Nt a (el XZ] + 


From (15.23) we obtain: 


N €é-1 
l= 


1@/=1 


2E[X¢Xv] cos(Qr f(€ — eT, ») la(f)I’. 


Theorem 15.4.1 (The Bandwidth of PAM Is that of the Pulse Shape). Suppose 
that the operational PSD in bi-infinite block-mode of a PAM signal (X(t) iS as 
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given in (15.23), e.g., that the conditions of Theorem 15.5.2 ahead are satisfied. 


Further assume 
N 


ASO, SEL Xe 0, (15.24) 
t=1 
e.g., that (X(t)) is not deterministically zero. Then the bandwidth of the SP (X(t)) 
is equal to the bandwidth of the pulse shape g. 


Proof. If g is bandlimited to W Hz, then so is (X(t), because, by (15.23), 
(a(/) = 0) + (Sxx(f) =0). 


We next complete the proof by showing that there are at most a countable number 
of frequencies f such that Syxx(f) = 0 but g(f) # 0. From (15.23) it follows 
that to show this it suffices to show that there are at most a countable number of 
frequencies f such that o(f) = 0, where 


gee oe 
a(f)4 NT S- De, E[XeXe] et f(t—-O)Ts 
S @=1 /=1 


N-1 
: i2nfmT, 
— Ym e f 
1 


m=—N+ 


N-1 


m2 


m=—N+1 


I 


(15.25) 


and 


AZ min{N,N+m} 
Vin = eae ne E[XeXe_ml, me{-N+1,...,N—1}. (15.26) 
s é=max{1,m+1} 


It follows from (15.25) that o(f) is zero if, and only if, e?"f' is a root of the 


mapping 
N-1 
Zh S- Wir Bs 
m=—-N-+1 
Since e!?"FTs is of unit magnitude, it follows that o(f) is zero if, and only if, e!?"/Ts 


is a root of the polynomial 


2N—2 
oS 2 Ay Nae (15.27) 
v=0 


From (15.26) and (15.24) it follows that yo > 0, so the polynomial in (15.27) is 
not zero. Consequently, since it is of degree 2N — 2, it has at most 2N — 2 distinct 
roots and, a fortiori, at most 2N —2 distinct roots of unit magnitude. Denote these 
roots by 
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where d < 2N — 2 and 6;,...,04 € [—7,7). Since f satisfies e27/' = e' if, and 
only if, 


for some 7 € Z, we conclude that the set of frequencies f satisfying o(f) = 0 is the 


set ; r 
: De me Re in Mh 
joer tzineztu u{ot+tinezh, 
and is thus countable. (The union of a finite (or countable) number of countable 
sets is countable.) 


15.5 A More Formal Account 


In this section we shall give a more formal account of the power at the output of 
a stable filter that is fed a PAM signal. There are two approaches to this. The 
first is based on carefully justifying the steps in our informal derivation.4 This 
approach is pursued in Section 18.6.5, where the results are generalized to complex 
pulse shapes and complex symbols. The second approach is to convert the problem 
into one about WSS stochastic processes and to then rely heavily on Sections 25.13 
and 25.14 on the filtering of WSS stochastic processes and, in particular, on the 
Wiener-Khinchin Theorem (Theorem 25.14.1). For the benefit of readers who have 
already encountered the Wiener-Khinchin Theorem we follow this latter approach 
here. We ask the readers to note that the Wiener-Khinchin Theorem is not directly 
applicable here because the PAM signal is not WSS. A “stationarization argument” 
is thus needed. 


The key results of this section are the following two theorems. 


Theorem 15.5.1. Consider the setup of Theorem 14.6.4 with the additional as- 
sumption that the autocovariance function Kxx of (Xe) is absolutely summable: 


co 


SS” |Kxx(m)| < ov. (15.28) 


m=—Cco 


Leth € Ly be the impulse response of a stable real filter. Then: 


(i) The PAM signal 
XK: (wth HAS” X¢(w) o(t — £15) (15.29) 
L=—0o 
is bounded in the sense that there exists a constant I such that 


|X(w,t)| <P, ( EQ, te R). (15.30) 


“The main difficulties in the justification are in making (15.16) rigorous and in controlling 
the decay of g xh for arbitrary h € Ly. 


258 Operational Power Spectral Density 


(ii) For every w € Q the convolution of the sample-path t > X(w,t) with h is 
defined at every epoch. 


(iti) The stochastic process 
(w,t) > / x(w,o) h(t — 0) do, ( EQ, te R) (15.31) 
that results when the sample-paths of X are convolved with h is a measurable 


stochastic process of power 


Co 


pe a AY sc ame (m) cian fms 
—o0o Ts 


m>=—Cco 


an?) JA(AP af. (15.32) 


Theorem 15.5.2. Consider the setup of Theorem 14.6.5. Leth € Ly, be the impulse 
response of a real stable filter. Then: 


(i) The sample-paths of the PAM stochastic process 


X: (w,th OA a Xp(w) 9(t — £15) (15.33) 


L=—0o 
are bounded in the sense of (15.30). 


(ii) For every w € Q the convolution of the sample-path t — X(w,t) and h is 
defined at every epoch. 


(iii) The stochastic process (X(t), t € R) xh that results when the sample-paths 
of X are convolved with h is a measurable stochastic process of power 


fore) A2 N N 6 F 
= S : i2n f(€-£’)Ts 
P= ie: (Fr E[X,X¢] e 


C101. 


an?) JA(AP af, (15.34) 


where (Xi, ve _ Xn) = enc(Dy, and , Dx), and where D,,...,D« are IID ran- 
dom bits. 


Proof of Theorem 15.5.1. Part (i) is a consequence of the assumption that (X;) 
is bounded in the sense of (14.16) and that the pulse shape g decays faster than 1/t 
in the sense of (14.17). 


Part (ii) is a consequence of the fact that the convolution of a bounded function 
with an integrable function is defined at every epoch; see Section 5.5. 


We next turn to Part (iii). The proof of the measurability of the convolution of 
(X(t), t € R) with h is a bit technical. It is very similar to the proof of Theo- 
rem 25.13.2 (i). As in that proof, we first note that it suffices to prove the result 
for functions h that are Borel measurable; the extension to Lebesgue measurable 
functions will then follow by approximating h by a Borel measurable function that 
differs from it on a set of Lebesgue measure zero (Rudin, 1974, Chapter 7, Lemma 1) 
and by then noting that the convolution of t+ X(w,t) with h is unaltered when h 
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is replaced by a function that differs from it on a set of Lebesgue measure zero. We 
thus assume that h is Borel measurable. Consequently, the mapping from R? to R 
defined by (t,0) + h(t —c@) is also Borel measurable, because it is the composition 
of the continuous (and hence Borel measurable) mapping (t,a7)  t — o with the 
Borel measurable mapping t + h(t). 


As in the proof of Theorem 25.13.2, we prove the measurability of the convolution 
of (Xx (t), te R) with h by proving the measurability of the mapping defined by 
(w,t) > (14+ 07)" f& X(w, 0) A(t — 0) do. To this end we study the function 


X(w,o) h(t — oc) 
Le 


((w,t),0) | , ((wt)€OxR, ceR), (15.35) 
This function is measurable because, as noted above, (t,0) > h(t — a) is measur- 
able; because, by Proposition 14.6.2, (X (2), te R) is measurable; and because the 
product of Borel measurable functions is Borel measurable (Rudin, 1974, Chap- 
ter 1, Section 1.9 (c)). Moreover, using (15.30) and Fubini’s Theorem it can be 
readily verified that this function is integrable. Using Fubini’s Theorem again, we 
conclude that the function 


Ges al. Kido Koos 


is measurable. Consequently, so is X * h. 


To conclude the proof we now need to compute the power in the measurable (non- 
stationary) SP X xh. This will be done in a roundabout way. We shall first define 
anew SP X’. This SP is centered, measurable, and WSS so the power in X’«xh can 
be computed using Theorem 25.14.1. We shall then show that the powers of X xh 
and X’ xh are equal and hence that from the power in X’ xh we can immediately 
obtain the power in X *h. 


We begin by defining the SP (X’(t), t € R) as 


X'(t)=X(t+S), teR, (15.36a) 


where $ is independent of (X(t)) and uniformly distributed over the interval [0, Ts], 
S~U ((0, Ts}) - (15.36b) 
That (X’(t)) is centered follows from the calculation 


E[X"(t)] 


I 


E[X (t+ S)] 

i L 
o Is 
0, 


l| 


E[X(t + s)| ds 


I 


where the first equality follows from the definition of (X’(t)); the second from the 
independence of (Xx (t)) and S and from the specific form of the density of S; and 
the third because (X(t)) is centered. That (X’(t)) is measurable follows because 
the mapping ((w, s),t) r+ X(w,t+ s) can be written as the composition of the 
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mapping ((w, 8), t) + (w,t+s) with the mapping (w,t)> X(w,t). And that it is 
WSS follows from the calculation 
a. ae 7) 
X(t+ S)X (¢+S+7)] 


aa X(t+s)X(t+s+4+7)| ds 
1 . = 

ia [ e| > S> Xeg(t+s—lls) S> Xv g(t+s+7-CT,)| ds 
$ 0 L=—00 L'=—00 
1 2 
= 7A 22a g(t +s — Ts) (t+s+r—f ) 
Leas 

= 7A ee Kxx(@—@ ye g(t +s— 01.) 9(t+s+7-0T,)ds 


_ 
-" yo Kxxtm) [ g(t + s— 1.) g(t + s+7 - (€—m)T,) ds 


i » s Kern) S el (6) 9(€+7 + ml) dé 

2 I m=—oco a L=—00o Ts +t : ; 

= TAY Kextm) f o(6) 9G +r me) ae 

= a 2 Kxx(m) Reg(mT;+7), 7,tER. (15.37) 


Note that (15.37) also shows that (X’(t)) is of PSD (as defined in Definition 25.7.2) 


“la(f)|?, FER, (15.38) 


a 
Sxx(f) = > Se, Kxx(m e€ 


which is integrable by the absolute summability of Kxx. 


Defining (Y’(t), t € R) to be (X(t), t € R) *h we can now use Theorem 25.14.1 
to compute the power in (Y’(t), t € R): 


Z ; iooyra =. em Kxx(m “UDP ) LDP a 


To conclude the proof we next show that the power in Y is the same as the power 
in Y’. To that end we first note that from (15.36a) it follows that 


1 
lim —E 
Too 2T 


(X! xh) ((w,s),t) = (K+h)(w,t+ 8), ( e0,0<6<7 £e R), 
ie., that 


¥'((w, s),t) =Y¥(w,t + 8), ( em 0S3< In fe R), (15.39) 
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It thus follows that 


T T 
/ Y?(w,t) at< | (Y'((w, 8), t))” dt, ( EO, 0<s<T,, t€ R), 


27 STLh 
(15.40) 
because 


“THT; -T-T. 
T+s 
= | Y*(w, a) do 
—T-Ts+s 
< 
> Y?(w,a)do, 0<8s<T,, 
-T 


where the equality in the first line follows from (15.39); the equality in the second 
line from the substitution o = t+.s; and the final inequality from the nonnegativity 
of the integrand and because 0 < s < Ts. 


Similarly, 
T T-Ts ; 
/ Y?(w, t) dt > i) (Y’((w,s),t))° dt, ( E0,0<s<T,, te R), (15.41) 
-T -T 
because 


T-T. T-Ts 
i (¥"(W,5),8))° at = f Y?(w,t +s) dt 


=7 


Combining (15.40) and (15.41) and using the nonnegativity of the integrand we 
obtain that for every w € Q and s € (0, Ts] 


T-T. ‘ T T+Ts , 
; (Y’((w, s), t)) at< | Y*(wa)do< f (Y’((w,s),t))" dt. (15.42) 
=T, 


—T+Ts —T-Ts 


Dividing by 2T and taking expectations we obtain 


5 me ie SNE eth co 
QT aart|/ we cat 


OT ies a TET ied 
oF aioe | (t))" dt}, (15.48) 


from which the equality between the power in Y’ and in Y follows by letting T 
tend to infinity and using the Sandwich Theorem. 
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Proof of Theorem 15.5.2. The proof of Theorem 15.5.2 is very similar to the proof 
of Theorem 15.5.1, so most of the details will be omitted. The main difference is 
that the process (X‘(t), t € R) is now defined as 

X'(t) = X(t+ S$) 
where the random variable S is now uniformly distributed over the interval [0, NTs], 


S~U((0, NTs]) - 


With this definition, the autocovariance of (X’(t), t € R) can be computed as 


Kxx/(T) 
=EX(t+ S)X(t+7+5)] 
1 NTs 
=a f E[X(t+s) X(t+7+)] ds 
A2 NTs oo oo 
= Nhe [ @rncan 8s ENE DearS $— /NT)) a 
NT; © oo 
a <_f S> SS Efu(X,t+s—vNT,) u(X,t+7 +5—-v/NT,)] ds 
ae ee 
= aa ps E[u(X,,t+s —vNT,) u(X,,t +7 +s—vNT,)] ds 


A2 NTs 
=e y E|u(Xo,t + s — NTs) u(Xo,t +7 +s—vNT,.)] ds 


NR <= 
A2 foe) 
arta E[u(Xo, €) u(Xo,€+7)] € 
A2 co N N 
=F / e| 846 91) Yo ale +r at) dé 
Ss =o. p=L n=l 
AZ N N 
~ NT, ye » E[X,Xn'] Reg (T+ (n—-1')), 7 ER, 
. n=17/=1 


where the third equality follows from (14.36), (14.39), and (14.40); the fifth follows 
from (14.43); the sixth because the N-tuples (X», NE Z) are IID; the seventh by 
defining € = t+; the eighth by the definition (14.40) of the function u(-); and the 
final equality by swapping the summations and the expectation. 


The process (X’(t)) is thus a WSS process of PSD (as defined in Definition 25.7.2) 


2 N N 
Xp ear f(e— 


iene (15.44) 


Sxx'(f 
S$ =1 #=1 


The proof proceeds now along the same lines as the proof of Theorem 15.5.1. 
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15.6 Exercises 


Exercise 15.1 (Scaling a SP). Let (Y(t)) be the result of scaling the SP (X(t)) by the 
real number a. Thus, Y(t) = aX(t) for every epoch t € R. Show that if (X(t)) is of 
operational PSD Sxx, then (Y(¢)) is of operational PSD f +> a? Sxx(f). 


Exercise 15.2 (The Operational PSD of a Sum of Independent SPs). Intuition suggests 
that if (X(t)) and (Y(t)) are centered independent stochastic processes of operational 
PSDs Sxx and Syy, then their sum should be of operational PSD f + Sxx(f) + Sw(f). 
Explain why. 


Exercise 15.3 (Operational PSD of a Deterministic SP). Let (X(t)) be deterministically 
equal to the energy-limited signal g: R — R in the sense that, at every epoch t € R, the 
RV X(t) is deterministically equal to g(t). Find the operational PSD of (X(t)). 


Exercise 15.4 (Stretching Time). Let (X(t)) be of operational PSD Sxx, and let a > 0 
be fixed. Define the SP (Y(t)) at every epoch t € R as Y(t) = X(t/a). Show that (Y(t)) 
is of operational PSD f + aSxx(af). 


Exercise 15.5 (The Operational PSD is Nonnegative). Show that if (X(t), t € R) is of 
operational PSD Sxx, then Sxx(f) must be nonnegative outside a set of frequencies of 
Lebesgue measure zero. Would this also have been true if we had not insisted that the 
operational PSD be symmetric? 


Hint: Proceed along the lines of the proof of Lemma 15.3.2. 


Exercise 15.6 (Operational PSD of PAM). Let (Xv, ¢ € Z) be IID with X, taking on 
the values +1 equiprobably. Let 


Xi(t)=A S> Xeg(t—eTs), tER, 
L=—0o 
where A, T; > 0 are deterministic. 
(i) Plot a sample function of X, for a realization of (Xe, LE Z) of your choice. 
(ii) Compute the operational PSD of X1. 
(iii) Repeat Parts (i) and (ii) for 


Xo(t)=A S> Xeg(t-2T.), tER. 
L=—0o 


(iv) How do the operational PSDs of X; and X2 compare? 


Exercise 15.7 (Spectral Shaping via Precoding). Let (Xv, @ € Z) be IID with X¢ taking 
on the values +1 equiprobably. Let Xp = X¢e+ Xe¢e_1 for every ¢ € Z. 


(i) Compute the operational PSD of the PAM signal 


Xi(t)= S> Xeg(t—eT.), teER 


£=—0o 


for g(-) decaying to zero sufficiently fast as |t| > oo, e.g., satsifying (14.17). 
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(ii) Throw mathematical caution to the wind and evaluate your answer for the pulse 
shape whose FT is 


a=Hlls a}. ser. 


(Ignore the fact that this pulse shape does not satisfy (14.17).) Plot your answer 
and compare it to the operational PSD of the PAM signal 


X2(t)= S> Xeg(t-£T.), teER. 
L=—o0o 


(iii) Show that X1 can also be written as a PAM signal with IID symbols but with a 
different pulse shape. That is, 


Xi) = S> Xenlt— eT), 


L=—0o 


h: t+ g(t) + g(t — Ts). 


Exercise 15.8 (The Operational PSD and Block Codes). PAM is used in block-mode in 
conjunction with the (1,2) binary-to-reals block encoder 


Or (41,-1), 1 (—1,+1) 


to transmit IID random bits. The pulse shape g(-) satisfies the decay condition (14.17). 
Compute the power and operational PSD of the signal. 


Exercise 15.9 (Repetitions and the Operational PSD). Let (X(t)) be the signal (15.22) 
that results when the (1,2) binary-to-reals block-encoder (10.4) is used in bi-infinite block- 
mode. Find the operational PSD of (X(t)). 


Exercise 15.10 (Direct-Sequence Spread-Spectrum Communications). This problem is 
motivated by uncoded Direct-Sequence Spread-Spectrum communications with process- 
ing gain N. Let the (1, N) binary-to-reals block encoder map 0 to the sequence ai,...,aN 
and 1 to —a1,...,-—an. Consider PAM with bi-infinite block encoding with this map- 
ping. Express the operational PSD of the resulting PAM signal in terms of the sequence 
ai,...,@n and the pulse shape g. Calculate explicitly when the pulse shape is the map- 
ping t+ I{|t| < T;/2} for two cases: when the sequence a1,...,an is the Barker-7 code 
(+1, +1, +1, —1,-1,+1,-—1) and when it is the sequence (+1, +1, +1,+1,+1, +1, +1). 
Compare the latter case with the case where the mapping is the antipodal mapping 
O++ +1, and 1+ —1, the baud period 7T;, and the pulse shape is t +> I{|t| < 7T;/2} 


Chapter 16 


Quadrature Amplitude Modulation 


16.1 Introduction 


We next discuss linear modulation in passband. We envision being allocated band- 
width W around the carrier frequency f-, so we can only send real signals whose 
Fourier Transform is zero at frequencies f satisfying | fl - fe| > W/2. That 
is, the FT of the transmitted signal is allowed to be nonzero only in the fre- 
quency interval [f. — W/2, fe + W/2] and in its negative frequency counterpart 
[—fe — W/2,—fe + W/2] (Definition 7.3.1). We assume throughout this chapter 
that 
WwW 


lenge (16.1) 


There are numerous ways to communicate in passband and, to complicate things 
further, sometimes seemingly different approaches lead to identical signals. Thus, 
while we would like to motivate the scheme we shall focus on—Quadrature Ampli- 
tude Modulation (QAM)—we cannot prove or claim that it is the only “optimal” 
solution.! Nevertheless, we shall try to motivate it by discussing some features 
that one would typically like to have and by then showing that QAM has these 
features. 


From our studies of PAM we recall that if we are allocated (baseband) band- 
width W Hz and if Ts; > 1/(2W), then we can find a bandwidth-W pulse shape 
whose time shifts by integer multiples of T, are orthonormal. If T, = 1/(2W), then 
such a pulse is the bandwidth-W unit-energy pulse t + V/2Wsinc(2Wt). (You may 
recall that such pulses are rarely used because they decay to zero too slowly over 
time, thus rendering the computation of the PAM signal unstable and the resulting 
peak power unbounded.) And if T; < 1/(2W), then no such pulse shape exists. 
(Corollary 11.3.5.) 


From a somewhat more abstract perspective, PAM with the above pulse shape (or 
with the square root of a raised-cosine pulse shape (11.29) with very small excess 


1 There are information theoretic considerations that show that QAM can achieve the capacity 
of the bandlimited passband additive white Gaussian noise channel. 
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bandwidth) allows us to send symbols arriving at rate 


| real symbol 
Rei | eee 
second 


as the coefficients in a linear combination of orthonormal signals whose bandwidth 


does not exceed (or only slightly exceeds) 


S [Hz] . 


That is, for each spectral sliver of 1 Hz at baseband we obtain 2 real dimensions 
per second, i.e., we can communicate at spectral efficiency 


[real dimension /sec] 
[baseband Hz] 


2 


This is an achievement that we would like to replicate for passband signaling: 


First Objective: Find a way to transmit real symbols arriving at rate Rs real sym- 
bols per second as the coefficients in a linear combination of orthonormal passband 
signals occupying a (passband) bandwidth of W Hz around the carrier frequency fe, 
where the bandwidth W is equal to (or only slightly exceeds) R,/2. That is, we 
would like to find a communication scheme that would allow us to communicate at 


[real dimension/sec] 
[passband Hz] 


Equivalently, since any stream of real symbols arriving at rate Rs real symbols 
per second can be viewed as a stream of complex symbols arriving at rate R,/2 
complex symbols per second (simply by pairing tuples (a, b) of real numbers a,b € R 
into single complex numbers a + ib), we can restate our objective as follows: find 
a way to transmit complex symbols arriving at rate R,/2 complex symbols per 
second as the coefficients in a linear combination of orthonormal passband signals 
occupying a (passband) bandwidth of W Hz around the carrier frequency f., where 
the bandwidth W is equal to, or only slightly exceeds R,/2. That is, we would like 
to find a communication scheme that would allow us to communicate at 


[complex dimension /sec] 
[passband Hz] 


(16.2) 


In addition, we would like our modulation scheme to be of reasonable complexity. 
One of the benefits of the baseband PAM scheme is that we can compute all the 
inner products required to reconstruct the coefficients (symbols) using the matched 
filter by feeding it with the transmitted signal and sampling its output at the 
appropriate times. 


A naive approach that does not achieve our objective is to use real baseband PAM 
of the type we studied in Chapter 10 and to up-convert the PAM signal to passband 
by multiplying it by the mapping t +> cos(27f,t). The problem with this approach 
is that the up-conversion doubles the bandwidth (Proposition 7.3.3). 
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16.2. PAM for Passband? 


A natural approach to passband signaling might be to consider PAM directly with- 
out any up-conversion. We merely have to look for a pulse shape @ whose Fourier 
Transform is zero outside the band || fl - Fe < W/2 and whose self-similarity 
function Rgg is a Nyquist Pulse. It turns out that with this approach we can only 
achieve our objective if 4f.T; is an odd integer. Indeed, the reader is encouraged 
to use Corollary 11.3.4 to verify that if a pulse @ is an energy-limited passband 
signal that is bandlimited to W Hz around the carrier frequency f., and if its time 
shifts by integer multiples of T; are orthonormal, then 


1 
SS fe. 
—~ 2W 
with equality being achievable only if both 
|S(f)|? = Ts TIF] — fe] < W/2} 


(for all frequencies f € R outside a set of Lebesgue measure zero) and 


Ts 


4f.T; is an odd integer. (16.3) 


In fact, it can be shown that if (16.3) is satisfied and if qw is any energy-limited 
signal that is bandlimited to W/2 Hz and whose time shifts by integer multiples 
of 2T, are orthonormal, then the passband signal 


b(t) = V2cos(2rfct) W(t), tER 


is an energy-limited passband signal that is bandlimited to W Hz around the carrier 
frequency f., and its time shifts by integer multiples of T; are orthonormal. 


It would thus seem that if (16.3) is satisfied, then PAM would be a viable solution 
to our problem. Nevertheless, this is not the standard solution. The reason may 
have to do with implementation. If the above approach is used, then the carrier 
frequency influences the choice of the pulse shape. Thus, a radio with a selectable 
carrier frequency would require a different pulse shape for each frequency! More- 
over, the implementation of the modulator becomes carrier-dependent and fairly 
complex. This discussion motivates our second objective: 


Second Objective: To allow for flexibility in the choice of the carrier, it is desir- 
able to decouple the pulse shape selection from the carrier frequency. 


16.3. The QAM Signal 


Quadrature Amplitude Modulation achieves both our objectives. It achieves our 
desired spectral efficiency (16.2) and also decouples the signal design from the 
carrier frequency. It is easiest to describe QAM by describing the baseband repre- 
sentation tpp(-) of the transmitted passband signal xpp(-). Indeed, the baseband 
representation of the transmitted signal has the structure of PAM but with one 
important difference: we allow for complex symbols and for complex pulse shapes.” 


? Allowing complex pulse shapes is not critical. Crucial is that we allow complex symbols. 
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In QAM the encoder 
gy: {0,1} = Cc” (16.4) 


maps k-tuples of data bits (D1,..., Dx) to n-tuples of complex symbols (C1,...,Cn), 
and the baseband representation of the transmitted signal is 


Xpp(t) =A > Crg(t— fs), teER, (16.5a) 
f=1 


where the pulse shape g(-) may be complex (though it is often chosen to be real), 
A > 0 is areal constant, T, > 0 is the baud period, and 1/T, is the baud rate. The 
rate of the encoder is given by 


k bit 
n amine -sabal : Goh) 


and the transmitted real passband QAM signal Xpp(-) is given by 
Xpp(t) = 2 Re(Xpp(t) err) teER. (16.5c) 


Using (16.5a) & (16.5c) we can also express the QAM signal as 


Xpp(t) = 2Re (a S > Ceg(t — £1s) ent) , teR. (16.6) 
l=1 


Alternatively, we can use the identities 
Re(wz) = Re(w) Re(z) —Im(w)Im(z),  w,z €C, 
Im(z) = —Re(iz), zE€C 
to express the QAM signal as 


g1,e(t) 


: 1 We 
Xpp(t) = V2A S— Re(Cy) 2Re ( a(t — Ts) oe) 
=], 
91, £,BB (t) 


gaq,e(t) 
— 


+ VAS? Im(C?) 2 Re (4, g(t — £13) et) , t€R, (16.7) 


l=1 


—S$ 
9Q,,BB(t) 
where we define 
1 
gte(t) = 2Re( g(t — Ts) oe) (16.8a) 


= 2Re(g1epn(t)e""""), teR, 
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and 
1 
t) = 2Re| i— g(t — £T, ene) 16.8b 
dyu(t) £ 2Re (i (0 — CT (16.80) 
= 2 Re(gq,¢,BB(t) aia s te R, 


with corresponding baseband representations: 


1 
t) + —g(t-4,), teR, 16.9a 
91,2,BB(t) Va ( ) ( ) 
1 
t) =i—g(t-@1,), teR. 16.9b 
9Q,¢,BB(t) Va ( ) ( ) 


Some comments about the QAM signal: 


(i) The representation (16.7) demonstrates that the QAM signal is a linear com- 
bination of the waveforms {gi} and {ga,e}, where the coefficients are pro- 
portional to the real parts and the imaginary parts of the symbols {C7}. 


(ii) The normalization factor of 1/2 in the definition of the functions {g1,7} and 
{Zq,c} is for convenience only. Its role will become clearer in Section 16.5, 
where the pulse shape is chosen to be of unit energy. In this case the factor of 
1/\/2 guarantees that the functions {g1,¢} and {gq,} are also of unit energy. 


wa 


(iii) We could also view QAM slightly differently as a modulation scheme where 
data bits D,,...,D, are mapped to 2n real numbers Xj,..., Xan, which are 
then grouped in pairs to form the n complex numbers Cy = X2¢_1 + iX2¢ 
for 2=1,...,n and where these complex numbers are then mapped into the 
passband signal whose baseband representation is given in (16.5a). The two 


views are, of course, completely equivalent. 


The expression for the QAM signal Xpx(-) is simplified if the pulse shape g is real. 
In this case we obtain from (16.6) for every t EC R 


Xpp(t) =2A 3 Re(C¢) g(t — €T;) cos(27 f.t) 
f=1 


— 2A“ Im(C?) 9(t — Ts) sin(2r fet), g real. (16.10) 
l=1 


Thus, if the pulse shape g is real, then the QAM signal can be viewed as the 
sum of two signals: the first is the result of feeding {Re(C?)} to a baseband PAM 
modulator of pulse shape g and multiplying the result by cos(27 f.t), and the second 
is the result of feeding {Im(C;)} to a baseband PAM modulator of pulse shape g 
and multiplying the result by —sin(27f.t). Figure 16.1 illustrates the generation 
of the QAM signal when the pulse shape g is real. 
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A >=, Re(Ce)g(t — £Ts) AX, Re(Ce)g(t — £Ts) cos(27 fet) 
PAM x 


I @ cos(27 fet) 


{Cy} ! xpp(t)/2 
Y 


ui 90° 


—sin(27 fot) 


ae PAM 26) 


A>, Im(Ce)g(t — Ts) ~-A SO, Im(Ce) g(t — £Ts) sin(27 fet) 


Figure 16.1: Generating a QAM signal when the pulse shape g is real. 


16.4 Bandwidth Considerations 


Recalling that the bandwidth of a passband signal around the carrier frequency is 
twice the bandwidth of its baseband representation (Proposition 7.6.7 and Theo- 
rem 7.7.12 (i)) we conclude: 


Note 16.4.1. If the pulse shape g is bandlimited to W/2 Hz, then the QAM signal 
(16.6) is bandlimited to W Hz around the carrier frequency f.. 


If the pulse shape g is real, then these bandwidth considerations can also be ex- 
plained in another way. We note that if g(-) is bandlimited to W/2 Hz then 
the signal )7, Re(C?) 9(t — €Ts) is also bandlimited to W/2 Hz, so when it is up- 
converted by multiplication by cos(27 f.t) the resulting signal is bandlimited to W 
Hz around the carrier frequency f. (Proposition 7.3.3). A similar argument holds 
for the signal that is multiplied by — sin(27 fct). 


16.5 Orthogonality Considerations 


We next study the consequences of choosing the pulse shape g(-) so that its time 
shifts by integer multiples of T; be orthonormal. As in our treatment of PAM, we 
change notation and denote the pulse shape in this case by ¢(-). The orthonormal- 
ity condition is thus 


‘ie o(t — £1,) ¢*(t-@T,) dt =Hl= 0}, 2 €Z. (16.11) 
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By Corollary 11.3.4, this is equivalent to requiring that 


2 o(f+=)f =t., (16.12) 
i. : 


for all frequencies f outside a set of Lebesgue measure zero. 


When the pulse shape satisfies the orthogonality condition (16.11) we refer to 1/T; 
as having units of complex dimensions per second. In analogy to Definition 11.3.6, 
we define the excess bandwidth as 


(16.13) 


100% Ga of d ) 


1/(2Ts) 


Proposition 16.5.1. If the energy-limited pulse shape @ satisfies (16.11), then the 
QAM signal Xpp(-) can be expressed as 


Xpp = V2AS—Re(Cr) Pre + V2A ¥_ Im(Cr) Hae (16.14) 


l=1 l=1 


where 


- Ur-1, ¥aQ,-1; V1.0; YQ,05 YI, YQ, +: 


are orthonormal functions that are given by 


1 ; 

wre: te 2Re( b(t — £T5) eet), lez (16.15a) 
1 . 

Wae: tr 2Re (i b(t — Ts) oe), LEZ. (16.15b) 


Proof. Substituting @ for g in (16.7) we obtain 


w1,e(t) 


Z 
Xpp(t) = V2A ¥ Re(Cy) 2Re( — o(t — pe) 
pa(l) = V2A 3 Rel) (= o-m1) 
=-—_—"~—_—_—---—’ 
1,e,BB(t) 
ba.e(t) 
n o 7 ee 
+ V2A S$" Im(Cy) 2Re (i b(t — &Ts) ge), teER, 
tal v2 
0aQ,e,BB(t) 
where for every t€ R 
vna(t) £2Re( ott — oT) e™"") (16.168) 


= 2Re (W1¢,BB (t) ele) 5 
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dae(t) =2Re (iz o(t — Ts) =e) 
= 2 Re (WoQ,c,BB (t) Cae 


and the baseband representations are given by 


eee) H 46270) 


S 


and 


vo,BB(t) = 75 p(t — ET3). 


We next verify that, when @ satisfies (16.11), the functions 


oa +WL-1, Wa,-1, 1,0, Wao; U1, VQ; axes 


are orthonormal. To this end we recall that the inner product between two real 
passband signals is twice the real part of the inner product between their baseband 
representations (Theorem 7.6.10). For ¢ # ¢’ we thus have by (16.11) 


(re, P10) = 2Re((H1,2,BB, Y10,BB)) 
1 


=2Re((t, 550 CT,),t a ott T.))) 


= 0, 


(Hae Paw) = 2Re((av.BB, ¥Q,’,BB)) 


=2Re((t) i ol £1) jt i ot ’T,))) 


= 0, 


and 


i A 
(hie, baer) = 2Re((t» Fy lt Mh), ti 
=) 


And for ¢ = é’ we have, again by (16.11), 


1 1 
(10, Pie) = 2Re((¢ a ot — £1), Fa 
=1 


9 


1 el 
(dre, Poe) = 2Re((t ; i o(t — lTs), t ‘a 
) 


2 
= Re(-i ||@|3 
—! 0, 


(16.16b) 


(16.17a) 


(16.17b) 
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and 


(ho,0, a0) = 2Re((t = 75 o(t — €T,), to 5 o(t — &.))) 


=a 


Notice that (16.14)—(16.15) can be simplified when ¢@ is real: 


Corollary 16.5.2. If, in addition to the assumptions of Proposition 16.5.1, we also 
assume that the pulse shape @ is real, then the QAM signal can be written as 


Xpp(t) = V2A 5 Re(C,) V2 ¢(t — £Tg) cos(27 fet) 


t=) 
- V2AS> Im(C;) V2 o(t — Ts) sin(27f.t), t€R, (16.18) 
L=1 
and 
{t H V2 o(t — £1) cos(2n fet) : {t > V2 o(t — £T) sin(2rfet)} 


are orthonormal. 


16.6 Spectral Efficiency 


We next show that QAM achieves our spectral efficiency objective. We assume 
that we are only allowed to transmit signals of bandwidth W around the carrier 
frequency f., so the transmitted signal can only occupy the frequencies f satisfying 


In order for the QAM signal to meet this constraint, we choose a pulse shape @ 
that is bandlimited to W/2 Hz, because the up-conversion doubles the bandwidth 
(Note 16.4.1). Thus, by Corollary 11.3.5, the orthogonality (16.11) can only hold 
if the baud period T; satisfies T; > 1/(2 x W/2) or 
1 
T; > —, 
— W 

with the RHS being achievable by choosing ¢ to be the bandwidth-W/2 unit-energy 
signal t+ VWsinc(Wt). 
If we choose T, equal to 1/W (or only slightly larger than that), then our modulation 
will support the transmission of complex symbols arriving at a rate of 1/T,; ~ W 
complex symbols per second. And since our QAM signal only occupies W Hz 
around the carrier frequency, our scheme achieves a spectral efficiency of 1 [complex 
dimension per second] per Hz. QAM thus achieves our spectral efficiency objective. 
This is so exciting that we highlight the achievement: 
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QAM with the bandwidth-W/2 unit-energy pulse shape given by 
t+ /Wsinc(Wt) transmits a sequence of real symbols arriving at 
a rate of 2W real symbols per second as the coefficients in a linear 
combination of orthogonal signals, with the resulting waveform 
being bandlimited to W Hz around the carrier frequency f.. It 
thus achieves a spectral efficiency of 

{real dimension/sec] [complex dimension/sec] 


[passband Hz] [passband Hz] 


16.7 QAM Constellations 


In analogy to the definition of the constellation of a PAM scheme (Section 10.8), 
we define the constellation of a QAM scheme (or, perhaps more appropriately, of 
the mapping ¢(-) in (16.4)) as the smallest subset of C of which C2 is an element 
for every @ € {1,...,n} and for every realization of the data bits. We denote 
the constellation by C. The number of points in the constellation C is just the 
number of elements of C. 


Important constellations include the square 4-QAM constellation (also knows as 
QPSK) 
{+1 +i,-1+i,-1—i,+1— i}, 


the square QAM constellation with (21) x (2v) points 


{a+ babe tay $1), 415-814 $34.4507= vy}, (16.19) 


and the M-PSK (M-ary Phase Shift Keying) constellation comprising the M com- 
plex numbers on the unit circle whose M-th power is one, i.e., 


aye eee eens os Se 


See Figure 16.2 for some common QAM constellations. Please note that the square 
16-QAM and the 16-PSK are just two of many possible constellations with 16 
points. However, some engineers omit the word “square” and write 4-QAM, 16- 
QAM, 64-QAM, etc. for the respective square constellations. 


We can also define the minimum distance 6 of a constellation C in analogy to 
(10.21) as 
64 min |e—e. (16.20) 


c,c’ EC 
céc! 
In analogy to (10.23), we define the second moment of a constellation C as 


1 
Ee Soler (16.21) 


cEC 
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4-QAM A 16-QAM A 


e e eo ef|e e 
8-PSK N 32-QAM A 
eelee 
eoeceleee 
eoeceleee 
> > 
eeceleee 
eoeveleeve 
eelee 


Figure 16.2: Some QAM constellations (drawn to no particular scale). 


16.8 Recovering the Complex Symbols via Inner Products 


Recall that, by Proposition 16.5.1, if the time shifts of @ by integer multiples of T; 
are orthonormal, then the QAM signal can be written as 


Xpp = V2AS—Re(Cr) pre + V2A SY" Im(Cr) Hae, 


l=1 l=1 


where the signals ..., #11, Ya,-1, Y1,0, Ya,0, Y1,1, YQ,1,---, Which are given in 
(16.15), are orthonormal. Consequently, the complex symbols can be recovered 
from the QAM signal (in the absence of noise) using the inner product: 


Re(Ct) = Te (Kop. tb). Veto. (16.22a) 
iin(O j= Res We. BSL ak (16.22b) 


J2A 


We next describe circuits to compute these inner products. With a view to future 
chapters where noise will be present, we shall describe more general circuits that 
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compute the inner products (r, ¢1,¢) and (r, Hq,e) for an arbitrary (not necessarily 
QAM) energy-limited signal r. Moreover, since the calculation of the inner products 
will not exploit the orthogonality condition (16.11), we shall describe the more 
general setting where the pulse shape is arbitrary and refer to the notation of 
(16.7). Thus, we shall present circuits to compute 


(r, g10) ; (vr, 8aQ,e) ; 


where gre and gq, and their baseband representations are given in (16.8) and 
(16.9). Here r is an arbitrary energy-limited signal. We present two approaches: 
an approach based on baseband conversion and a direct approach. 


16.8.1 Inner Products via Baseband Conversion 


We begin by noting that if the pulse shape g is bandlimited to W/2 Hz then both 
gi and gq. are bandlimited to W Hz around the carrier frequency f,. Conse- 
quently, since they contain no energy outside the bands [f. — W/2, f. + W/2] and 
(—fe-W/2, —fe+W/2], it follows from Parseval’s Theorem that the Fourier Trans- 
form of r outside these bands does not influence the value of the inner products. 
Thus, ifs is the result of passing r through an ideal unit-gain bandpass filter of 
bandwidth W around the carrier frequency fo, i.e., 


s=r* BPFw,;., (16.23) 

then 
(r, 1,2) = (8, 81,2) ; (16.24a) 
(t, a,c) = (8,8a,e) - (16.24b) 


If we denote the baseband representation of s by spp, then 


(r, 81,2) = (8, 81,2) 
> 2 Re( (spp. 81,¢,BB)) 
= V2Re((spp,t > g(t — Ts), (16.25a) 
where the first equality follows from (16.24a); the second from Theorem 7.6.10; 
and the final equality from (16.9a). Similarly, 
(r, SQ.) = (8, 8Q.e) 
= 2Re((spp, 8Q,¢,BB)) 
= V2Re((spr,t + i9(t — £T,))) 
= V2Im((spp,t g(t — £Ts))). (16.25b) 


We next describe circuits to compute the RHS of (16.25a) & (16.25b). The circuit 
to produce spp from s was already discussed in Section 7.6 on the baseband rep- 
resentation of passband signals (Figure 7.11). One multiplies s(t) by e~?*fe* and 
then passes the result through a lowpass filter whose cutoff frequency W, satisfies 


es ae eee 
2 2 
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LPFw,. |—» Re(spp) 


cos(27 fet) 


r(t)—»|  BPFw,,. 


w|& 


< Wz < 2fc — 


w|& 


90° 


LPF wy, |—> Im(spp) 


Figure 16.3: QAM demodulation: the front-end. 


spp = (th s(t)e7?*/') » LPFw,, 


or, in terms of real operations: 
Re(spp) = (t+ s(t) cos(27fct)) * LPF w., 


Im(spp) = —(t > s(t) sin(27fct)) « LPF w, - 
This circuit is depicted in Figure 16.3. Notice that this circuit depends only on 
the carrier frequency f, and on the bandwidth W; it does not depend on the pulse 
shape. 
Once spp has been computed, the calculation of the inner products on the RHS of 


(16.25a) & (16.25b) is straightforward. For example, to compute the inner product 
on the RHS of (16.25a) we note that from (16.25a) 


(r, 812) = V2Re é 


—co 


Co 


SBB (t) g (t = £Ts) ar) 
= vif Re(spp(t)) Re(g(t — £T.)) dt 
+ vif Im(spp(t)) Im(g(t — Ts) de, (16.26) 


where the terms on the RHS can be computed by feeding Re(sgp) to a matched 
filter matched to Re(g) and sampling the filter’s output at time @T; 


ie Re(spp(t)) Re(g(t — ¢T;)) dt = (Re(spp) * Re(&)) (C13), (16.27) 


—oo 
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and by feeding Im(sgg) to a matched filter matched to Im(g) and sampling the 
filter’s output at time /T; 


i Im(spp(t)) Im(g(t — £Ts)) dt = (Im(spp) * Im(g)) (£Ts). (16.28) 


—oCo 


Similarly, to compute the inner product on the RHS of (16.25b) we note that from 
(16.25b) 


Co 


(r,8Q.0) = vatm( f 


—oco 


SBB (t) g(t on eTs) ar) 
2349 y Tei) REGO eT)) a 
vif Re(spa(t)) Im(g(t — 1.) dt, (16.29) 


where the inner products can be computed again using a matched filter: 


i Im(spp(t)) Re(g(t = £1) dt = (Im(spp) * Re(&)) (£3), 


i; *'Ra(ses@) In G@= A) aban) KAS) (C10: 


—oCo 


Things become simpler when the pulse shape g is real. In this case (16.26) and 
(16.29) simplify to 


(vr, 21,2) = v2 | Re(sna(t) g(t — £T;) dt,  g real, (16.30a) 


(Vr, 8Q,0) = V3 | tm(sen(t) g(t — €T;) dt, g real. (16.30b) 


Diagrams demonstrating how these inner products are computed are given in Fig- 
ures 16.3 and 16.4. We have already discussed the first diagram, which includes the 
front-end bandpass filter and the circuit for producing spp. The second diagram 
includes the matched filtering needed to compute the RHS of (16.30a) and the 
RHS of (16.30b). Notice that we have accomplished our second objective in that 
the first circuit depends only on the carrier frequency f. (and the bandwidth W) 
and the second circuit depends on the pulse shape but not on the carrier frequency. 


16.8.2 Computing Inner Products Directly 


The astute reader may have noticed that neither the bandpass filtering of the 
signal r nor the image rejection filters that produce spp are needed for the com- 
putation of the inner products. Indeed, starting from (16.8a) 


(r,g12) = (r,t is 2 Re(g1,¢,BB(t) clam fet) \ 
= 2Re((r,t + grepp(t) e?7"*)) 
= 2Re((t r(t) e Priel pry pp)) 
= V2Re((Er> r(f) Ptr g(t — MTs))), (16.31) 
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Re(sps) —_—>| g iT > A (r, g1,c) 
: we 
Im(spp) g IT ~~ (r, a,c) 


Figure 16.4: QAM demodulation: matched filtering (g real). 


where the second equality follows because r is real and the last equality from 
(16.9a). Similarly, starting from (16.8b) 


(r,gq,e) = (r,t > 2Re(ga,cpn(t) e?”"*)) 
= 2Re((r,t— gqepp(t) e?™/*")) 
= 2Re((t > r(t) e?thet go p pp)) 
= V2Re((t 6 r(t) e~?7Ft t & ig(t — €T,))) 
= V2Im((t & r(t)e7?"4",t & g(t — £T5))), (16.31b) 
where the fourth equality follows from (16.9b). Notice that the RHS of (16.31a) 


and the RHS of (16.31b) do not involve any filtering. To see how to implement 
them with real operations we can write them more explicitly as: 


(r, 81,2) =vare( [~ r(t) e~2*Fet g*(t — OT) d ‘), 


—oco 


(r, 8Q,¢) = viim( [~ r(t) e727 Ft o* (t — £T,) at), 


—co 


or even more explicitly in terms of real operations as: 


(r, g1,2) =v2 fre ) cos(27 fet) Re(g(t — Ts)) dt 
= va fv ) sin(27 fot) Im(g(t — Ts) dt, (16.32a) 


(vr, 8Q,e) =-va fv ) cos(27 fet) Im(g(t — éTs)) dt 
= va fr ) sin(27 fet) Re(g(t — €T,)) dt. (16.32b) 


The two approaches we discussed for computing the inner products are, of course, 
mathematically equivalent. The former makes more engineering sense, because the 
bandpass filter typically guarantees that the energy in s is significantly smaller 
than in r, thus reducing the dynamic range required from the rest of the receiver. 


The latter approach is mathematically cleaner because it requires less mathemat- 
ical justification. One need not check that the various filters satisfy the required 
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integrability conditions. Moreover, this approach is more useful when r is not 
energy-limited and when this is compensated for by the fast decay of the pulse 
shape. (See, for example, the situation addressed by Proposition 3.4.4.) 


16.9 Exercises 


Exercise 16.1 (Nyquist’s Criterion and Passband Signals). Corollary 11.3.4 provides con- 
ditions under which the time shifts of a signal by integer multiples of T; are orthonormal. 
Discuss how these conditions apply to real passband signals of bandwidth W around the 
carrier frequency fc. Specifically: 


(i) Plot the function 


co 


ae» 


? e\/? 

a(t+7)| 

for the passband signal y of Figure 7.2. Pay attention to how the sum at positive 
frequencies is influenced by the signal’s FT at negative frequencies. 


(ii) Show that there exists a passband signal ¢(-) whose bandwidth W around the 
carrier frequency fc is 1/(2T;) and whose time shifts by integer multiples of Ts are 
orthonormal if, and only if, 4T; f- is an odd integer. Show that such a signal must 
satisfy (outside a set of frequencies of Lebesgue measure zero) 


% 1 
MA] =VTEIMIIFl- fel S ae} FER 


(iii) Let @ be an energy-limited baseband signal of bandwidth W/2 whose FT is a 
symmetric function of frequency and whose time shifts by integer multiples of (2T;) 
are orthonormal. Let the carrier frequency f. be larger than W/2 and satisfy 
that 4Tsfc is an odd integer. Show that the (possibly complex) passband signal 
tr V2cos(2r fet) d(t) is of bandwidth W around the carrier f., and its time shifts 
by integer multiples of T; are orthonormal. 


Exercise 16.2 (How General is QAM?). Under what conditions on A, fc, ¢, W, and Ts 
can we view the signal 


tr ARe Ce S> Cr sine(Wt — ct.)) 


l=1 


as a QAM signal? 


Exercise 16.3 (M-PSK). Consider a QAM signal Xpxz of the form (16.6) with the pulse 
shape g: t + I{—T;/2 < t < Ts/2} and symbols (Cy?) that are IID and uniformly dis- 
tributed over the set 


{ei2t/8 p2i2n/8 | gTidm/8 1) 
(i) Plot a sample function of (Xpx(t), t € R). 
(ii) Are the sample paths continuous? 


(iii) Express Xpp(t) in the form 2A cos(27 ft + ®(t)) and describe (t). Plot a sample 
path of (®(t)). 
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Exercise 16.4 (Transmission Rate, Encoder Rate, and Bandwidth). Data bits are to be 
transmitted at rate Rp bits per second using QAM with a pulse shape @ satisfying the 
orthonormality condition (16.11). 


(i) Let W be the allotted bandwidth around the carrier frequency. What is the minimal 
constellation size required for the data bits to be reliably communicated in the 
absence of noise? 


(ii) Repeat Part (i) if you are required to use a pulse shape of excess-bandwidth of 
B = 15% or more. 


Exercise 16.5 (Synthesis of 16-QAM). Let X1(-) and X2(-) be 4-QAM (QPSK) signals 
that are given for every t € R by 


X,(t) =2A Re( > Cf g(t — fT.) on) , v=1,2, 
l=1 


where the symbols (Cr: ) take on the values +1 +i. Show that for the right choice of the 
constant a € R, the signal 


X(t) =aX,(t)+ X2(t), tEeR 


can be viewed as a 16-QAM signal with a square constellation. 
Exercise 16.6 (Orthogonality of the In-Phase and Quadrature Components). Let the 
pulse shape g be a real integrable signal that is bandlimited to W/2 Hz, and let the 


carrier frequency f. be larger than W/2. Show that, even if the time shifts of g by 
integer multiples of T; are not orthonormal, the signals 


tr g(t — Ts) cos(2r fet + y) and tr g(t— “’Ts) sin(27fet + y) 


are orthogonal for all integers £, ¢’ (not necessarily distinct). Here y € [—7, 7) is arbitrary. 


Exercise 16.7 (The Importance of the Phase). Let x and y be real integrable signals 
that are bandlimited to W/2 Hz. Let the transmitted signal s be 


i Re( ((t) + iy(t)) Cee) 
= x(t) cos(27fet + dr) — y(t) sin(QQtfcet+ or), tER, 


where f. > W/2, and where ¢r denotes the phase of the transmitted carrier. The receiver 
multiplies s(t) by 2 cos(27f.t+@r) (where dp denotes the phase of the receiver’s oscillator) 
and passes the resulting product through a lowpass filter of cutoff frequency W/2 to 
produce the signal x: 


&(t) = ((r 1+ 8(r) 2cos(2m fet + r)) *LPFw) (t), teER. 


Express Z(-) in terms of x(-), y(-), @r and ¢r. Evaluate your expression in the following 
cases: or = or, or orR=T, or or = 7/2, and or — dr = /4. 


Exercise 16.8 (Phase Imprecision). Consider QAM with a real pulse shape and a receiver 
that performs a conversion to baseband followed by matched filtering (Section 16.8.1). 
Write an expression for the output of the receiver if its oscillator is at the right frequency 
but lags the phase of the transmitter’s oscillator by Ad. 
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Exercise 16.9 (Rotating a QAM Constellation). Show that rotating a QAM constellation 
changes neither its second moment nor its minimum distance. 


Exercise 16.10 (Optimal Rectangular Constellation). Consider all rectangular constella- 
tions of the form 

{a+ ib,a — ib, —a + ib, —a — ib}," 
where a and 0 are real. Which of these constellations whose second moment is one has 
the largest minimum distance? 


Chapter 17 


Complex Random Variables and Processes 


17.1 Introduction 


We first encountered complex random variables in Chapter 16 on QAM. There we 
considered an encoder that maps k-tuples of bits into n-tuples of complex numbers, 
and we then considered the result of applying this encoder to random bits. The 
resulting symbols were therefore random and were taking value in the complex 
field, i.e., they were complex random variables. Complex random variables are 
functions that map “luck” into the complex field: they map every outcome of the 
experiment w € 2 to a complex number. Thus, they are very much like regular 
random variables, except that they take value in the complex field. They can 
always be considered as pairs of real variables: their real and imaginary parts. 


It is perfectly meaningful to discuss their expectation and variance. If C is a 
complex random variable, then 


E[C] = E[Re(C)] +iE[Im(C)], 


Elic?| =e [(Re(C))"| 4: E|(Im(C))"| 
and 
Var[C] = E||C - E(C|"| 
= E||c/?] - |eE[c}|’. 


In this chapter we shall make the above definition of complex random variables 
more formal and also discuss complex random vectors and complex stochastic pro- 
cesses. 


Complex random variables can be avoided if one treats such variables as pairs 
of real variables. However, we do not recommend this approach. Many of the 
complex variables and processes encountered in Digital Communications possess 
additional properties that simplify their manipulation, and complex variables are 
better suited to take advantage of these simplifications. 


We begin this chapter with some notation followed by some basic definitions for 
complex random variables. We next introduce a property that simplifies their 
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manipulation: properness. (Another such property, circular symmetry, is described 
in Chapter 24.) Finally, we extend the discussion to complex random vectors and 
conclude with a discussion of complex stochastic processes. 


17.2 Notation 


The notation we use in this chapter is fairly standard. The only issue that may 
need clarification is the difference between three matrix/vector operations: trans- 
position, conjugation, and Hermitian conjugation. These operations are described 
next. 

All vectors in this chapter are column vectors. Thus, a vector a whose components 
are a“),...,a is the column vector 


Pe lead (17.1) 
a?) 


We shall sometimes refer to such a vector a as an n-vector to make the number of 
its components explicit. For typesetting reasons, we shall usually use the notation 


a= (a,...,a™)", (17.2) 


which is more space efficient. Here the operator (-)' denotes the matrix trans- 
pose. Thus if we think of (a,...a) as a 1 x n matrix, then (a™,... a‘) is 
this matrix’s transpose, i.e., an n x 1 matrix, or a vector. More generally, if A is 
an n X m matrix, then A’ is an m x n matrix whose Row-j Column-¢ component 
is the Row-@ Column-j component of A. We say that A is symmetric if A' = A. 


We use (-)* to denote componentwise complex conjugation. Thus, if a is as 
in (17.1), then 
(a)" 
(a) 
a‘ = ; : (17.3) 


(amy 


We use (-)' to denote Hermitian conjugation, i.c., the componentwise conjugate 
of the transposed matrix. Thus, if a is as in (17.1), then a! is the 1 x n matrix 


ai = ((a)’,...,@)*). (17.4) 


The Hermitian conjugate At of an n x m matrix A is an m Xn matrix whose Row-j 
Column-f component is the complex conjugate of the Row-@ Column-j component 
of the matrix A. We say that a matrix A is conjugate-symmetric or self-adjoint 
or Hermitian if At = A. 
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Note that if a and b are n-vectors, then a'b is a scalar 
a'b= yas, (17.5) 
j=l 


whereas ab! is the n x n matrix 


aa) aD—@ ... gDym 
a@o) @2@y2 _.. gyn) 
ab! = : : 
ap) gp) gpm) 


17.3, Complex Random Variables 


We say that C is a complex random variable (CRV) on the probability space 
(OQ, F, P) if C: Q > C is a mapping from 2 to the complex field C such that both 
Re(C) and Im(C) are random variables on (Q, F, P). 


Any CRV Z can be written in the form Z = X +iY, where X and Y are real 
random variables. But there are some advantages to studying complex random 
variables over pairs of real random variables. Those will become apparent when we 
discuss analytic functions of complex random variables and when we discuss com- 
plex random variables that have special properties such as that of being “proper” 
or that of being “circularly-symmetric.” 


Many of the definitions related to complex random variables are similar to the 
analogous definitions for pairs of real random variables, but some are not. We 
shall try to emphasize the latter. 


17.3.1 Distribution and Density 


Since it makes no sense to say that one complex number is smaller than another, we 
cannot define the cumulative distribution function (CDF) of a CRV as in the real 
case: an expression like “Pr/Z < 1+ /i]” is meaningless. We can, however, discuss 
the joint distribution function of the real and imaginary parts of a CRV, which 
specifies Pr[Re(Z) < x, Im(Z) < y] for all x,y € R. We say that two complex 
random variables W and Z are of equal law (or have the same distribution) and 
write W = Z, if the joint distribution of the pair (Re(W),Im(W)) is identical to 
the joint distribution of the pair (Re(Z), Im(Z)): 


(w2z)e 


(Pr[Re(W) < x,Im(W) < y] = Pr[Re(Z) < 2,Im(Z) < y], 2, € R), (17.6) 
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Similarly, we can define the density function fz(-) (if it exists) of a CRV Z at the 
point z € C as the joint density of the real pair (Re(Z),Im(Z)) at (Re(z), Im(z)): 


fz(z) - FRe(Z),Im(Z) (Re(z), Im(z)), ZE C, (17.7) 


which can also be written as 


Ee? 
fz(z) = = Pr|Re(Z) < 2, Im(Z) < y , 2£€C. (17.8) 
Ox Oy | «=Re(z),y=Im(z) 


The notions of distribution function and density of a CRV extend immediately to 
pairs of complex variables and, more generally, to n-tuples. 


17.3.2 The Expectation 


The expectation of a CRV can be defined in terms of the expectations of its real 
and imaginary parts: 


E[Z] = E[Re(Z)] +iE[Im(Z)], (17.9) 


provided that the two real expectations E[Re(Z)] and E[Im(Z)] are finite. With 
this definition one can readily verify that, whenever E[Z] is defined, conjugation 
and expectation commute 


BIZ) SEZ (17.10) 

and 
Re(E[Z]) = E[Re(Z)], (17.11a) 
Im(E[Z]) = E[Im(Z)]. (17.11b) 


If the CRV Z has a density fz(-), then the expectation E[g(Z)] for some measurable 
function g: C — C can be formally written as 


Elo(Z)] =f fal2)g(2) az (17.12) 
zEC 
or, in terms of real integrals, as 
E[9(Z)] = a a fz(a + iy) Re(g(« + iy) dx dy 
i io fz(a + iy) Im(9(x + iy)) dx dy. (17.13) 


Thus, rather than computing the distribution of g(Z) and of then computing the 
expectations of its real and imaginary parts, one can use (17.12). 
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17.3.3. The Variance 


The definition of the variance of a CRV is not consistent with viewing the CRV as 
a pair of real random variables. The variance Var|Z] of a CRV Z is defined as 


Var[Z] = E[|Z — E[Z]]|?] (17.14a) 
= E[|Z|?] — |E[Z]|? (17.14b) 
= Var[Re(Z)] + Var[Im(Z)]. (17.14c) 


This definition should be contrasted with the definition of the covariance matrix 
of the pair (Re(Z), Im(Z)) 


( Var {Re(Z)] a 
Cov[Re(Z), Im(Z)] Var [Im(Z)| 


One can compute the variance of Z from the covariance matrix of (Re(Z), Im(Z)), 
but not the other way around. Indeed, the variance of Z is just the trace of the 
covariance matrix of (Re(Z), Im(Z)). 


To derive (17.14b) from (17.14a) we note that 


E [|Z - E[Z\?] = E[(Z - E[Z))(2 - E[Z))"] 
= E[(Z - E[Z])(Z* - E[Z*))] 
= E[(Z - E[Z])Z*] - E[(Z - E[Z))] E[2"] 
= El(Z - E[Z])2"] 
=E[ZZ*] — E[Z] E[Z*] 
=E [|Z] - |EIZI?, 


where we only used the linearity of expectation and (17.10). Here the first equality 
follows by writing |w|? as ww*; the second by (17.10); the third by simple algebra; 
the fourth because the expectation of Z — E[Z] is zero; and the final by (17.10). 


To derive (17.14c) from (17.14b) we write E{|Z|?] as E[(Re(Z))? + (Im(Z))?] and 
express |E[Z]|? using (17.9) as E[Re(Z)]” + E[Im(Z)]?. 


17.3.4 Proper Complex Random Variables 


Many of the complex random variables that appear in Digital Communications 
are proper. This is a concept that has no natural counterpart for real random 
variables. 


Definition 17.3.1 (Proper CRV). We say that the CRV Z is proper if the following 


three conditions are all satisfied: it is of zero-mean; it is of finite-variance; and 
Elan) 0: (17.15) 


Notice that the LHS of (17.15) is, in general, a complex number, so (17.15) is 
equivalent to two real equations: 


E[Re(Z)?] = E[Im(Z)?] (17.16a) 
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and 
E[Re(Z) Im(Z)] = 0. (17.16b) 


This leads to the following characterization of proper complex random variables. 


Proposition 17.3.2. A CRV Z is proper if, and only if, all three of the following 
conditions are satisfied: Z is of zero mean; Re(Z) & Im(Z) have the same finite 
variance; and Re(Z) & Im(Z) are uncorrelated. 


An example of a proper CRV is one taking on the four values {+1, +i} equiprobably. 


We mentioned earlier in Section 17.3.3 that the variance of a CRV is not the 
same as the covariance matrix of the tuple consisting of its real and imaginary 
parts. While the covariance matrix determines the variance, the variance does not 
uniquely determine the covariance matrix. However, if a CRV is proper, then its 
variance uniquely determines the covariance matrix of its real and imaginary parts. 
Indeed, by Proposition 17.3.2, a zero-mean finite-variance CRV is proper if, and 
only if, the covariance matrix of the pair (Re(Z), Im(Z)) is given by 


07 ven): 


17.3.5 The Covariance 


The covariance Cov|Z,W] between the complex random variables Z and W is 
defined by 


Cov[Z, W] 4 E|(Z ~E[Z]) (Ww - E(W1)"] (17.17) 


Again, this definition is different from the one for pairs of real random variables: 
the covariance between two pairs of real random variables is a real matrix, whereas 
the covariance between two CRVs is a complex scalar. 


Some of the key properties of the covariance are listed next. They hold whenever 
the a’s and (@’s are deterministic complex numbers and the covariances on the RHS 
are defined. 


(i) Conjugate Symmetry: 
Cov[Z, W] = (Cov[W, Z])*. (17.18) 


(ii) Sesquilinearity: 


CoviaZ, W] = aCov[Z, W], (17.19) 
Cov[Z1 + Zo, W] = Cov[Z,, W] + Cov[Z2, W], (17.20) 
Cov[Z, BW] = B* Cov[Z, W], (17.21) 
Cov[Z, W1 + Wa] = Cov[Z, Wi] + Cov[Z, Wo] , (17.22) 


and, more generally, 


Cov] 0jZ;, >~ yW,| =>) asf} Cov[Z;, Wy]. (17.23) 
j=l j=l 


jal j/=1 
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(iii) Relation with Variance: 
Var|Z] = Cov[Z, Z]. (17.24) 


(iv) Variance of Linear Functionals: 


n 


Var| a,2)| =SOYS- aja}, Cov[Z;, Z;]. (17.25) 
j=1 j j 


jal j=l 


n 


17.3.6 The Characteristic Function 


The definition of the characteristic function of a CRV is consistent with viewing it as 
a pair of real random variables. Recall that the characteristic function ®x: R— C 
of a real random variable X is defined by 


®x: mr Efe'™*], weR. (17.26) 


For a pair of real random variables X,Y the joint characteristic function is the 
mapping ®y y: R? > C defined by 
@®x y: (W1,m2) Ss a » 1,02 ER. (17.27) 
Note that the expectations in (17.26) and (17.27) are always defined, because the 
argument to the expectation operator is of modulus one (| e'” | = 1, whenever r is 


real). This motivates us to define the characteristic function for a complex random 
variable as follows. 


Definition 17.3.3 (Characteristic Function of aCRV). The characteristic func- 
tion ®7: C > C of a complex random variable Z is defined as 


&z(a) 4 Ee 2 , wee 


_ Eee Re(Z)+Im(@) Im(Z)) , wec. 
Here we can think of Re(w) and Im(a) as playing the role of @ and wz in (17.27). 


17.3.7 Transforming Complex Variables 


We next calculate the density of the result of applying a (deterministic) transfor- 
mation to a CRV. The key to the calculation is to treat the CRV as a pair of real 
random variables and to then apply the analogous result regarding the transfor- 
mation of a random real tuple. To that end we recall the following basic theorem 
regarding the transformation of real random vectors. In the theorem’s statement 
we encounter the notion of an open subset of R”. Loosely speaking, D C R” is an 
open subset of R” if to each x € D there corresponds some € > 0 such that the 
ball of radius € and center x is fully contained in D.1 


1Thus, D is an open subset of R” if D C R” and if to each x € D there corresponds some 
€ > 0 such that each y € R” satisfying (x — y)'(x — y) < e is in D. 
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Theorem 17.3.4 (Transforming Real Random Vectors). Let g: D — R be a one- 
to-one mapping from an open subset D of R” onto a subset R of R”. Assume 
that g has continuous partial derivatives in D and that the Jacobian determinant 
det (Og(x)/Ox) is at no point of D zero. Let the real random n-vector X have 
the density function fx(-) and satisfy Pr[X € D] = 1. Then the random n-vector 
Y = g(X) is of density 


fy(y) = Ix) -Hy ER}. (17.28) 


~ Og(x) 
det a 


x=9~"(y) 


Using Theorem 17.3.4 we can relate the density of a CRV Z and the joint distri- 
bution of its phase and magnitude. 


Lemma 17.3.5 (The Joint Density of the Magnitude and Phase of a CRV). Let Z 
be a CRV of density fz(-), and let R=|Z| and O € [—7,7) be the magnitude and 
argument of Z: 

Z=Re®, Z>0, OE [-z,7). 


Then the joint distribution of the pair (R,O) is of density 
fro(r,0) =rfz(re®), r>0, 0€ [-7,7). (17.29) 


Proof. This result follows directly from Theorem 17.3.4 by computing the absolute 
value of the Jacobian determinant of the transformation? (x,y) > (r,@) where 


r= 1/2? +y? and 0 = tan '(y/z): 


Or Or 1 
act (33 3% |= 1 
dx Oy fu? + y? 


For the next change-of-variables result we recall some basic concepts from Complex 
Analysis. Given some z) € C and some nonnegative real number r > 0, we denote 
by D(zo,1) the disc of radius r that is centered at zo: 


D(zo,7) = {z €C: |z— 29| <r}. 


We say that a subset D of the complex plane is open if to each z € D there 
corresponds some € > 0 such that D(zo,¢) C D. Let g: D — C be some function 
from an open set D C C to C. Let z be in D. We say that g(-) is differentiable 
at 2 € D and that its derivative at zp is the complex number g’(zo), if for every 
€ > 0 there exists some 6 > 0 such that 


g(20 + h) — g(20) 
h 


g'(z0)| <«, (17.30) 


?Here D is the set R? without the origin. 
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whenever the complex number h € C satisfies 0 < |h| < 6. It is important to note 
that here h is complex. If g is differentiable at every z € D, then we say that g is 
holomorphic or analytic in D.? 


Define the mappings 


u,v: {z,yER:xr+iyeD}—R (17.31a) 
by 
u(z,y) = Re(g(a + iy)), (17.31b) 
and 
v(x, y) = Im(g(a + iy)). (17.31c) 


Proposition 17.3.6 (The Cauchy-Riemann Equations). Let D C C be open and 
let g: D — C be analytic in D. Let u,v be defined by (17.31). Then u and v 
satisfy the Cauchy-Riemann equations 


Ou(a, y) > Ov(a, y) 


Dr dy (17.32a) 
Ou(z,y) _ _ Ov(x,y) 
ono ae (17.32b) 
at every x,y € R such that x+iy © D, and 
g(z)= (a 4 ee ) BED: (17.33) 
% 7 («,y)=(Re(z),Im(z)) 


Moreover, the partial derivatives in (17.32) are continuous in the subset of R? 
defined by {x,yE R: ax+iy€ D}. 


Proof. See (Rudin, 1974, Chapter 11, Theorem 11.2 & Theorem 11.4) or (Nehari, 
1975, Chapter II, Section 5 & Chapter III, Section 3). 


We can now state the change-of-variables theorem for CRVs. 


Theorem 17.3.7 (Transforming Complex Random Variables). Let g: D > R be 
a one-to-one mapping from an open subset D of C onto a subset R of C. Assume 
that g is analytic in D and that at no point of D is the derivative of g zero. Let 
the CRV have the density function fz(-) and satisfy Pr[Z € D] =1. Then the CRV 
defined by W = g(Z) is of density 


_ faz) 
fw(w) = |9’(z)|? aa 


I{w € R}. (17.34) 


Here g~‘(w) denotes the point in D that is mapped by g to w. 


3There is some confusion in the literature about the terms analytic, holomorphic, and 
regular. We are following here (Rudin, 1974). 
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Note 17.3.8. The square in (17.34) does not appear in dealing with real random 
variables. It appears here because a mapping of complex numbers is essentially 
two-dimensional: scaling by a € C translates to a scaling of area by |a|?. 


Proof. To prove (17.34) we begin by expressing the function g(-) as 
g(a + iy) = u(y) +iv(e,y), (2,yER, «+iyeD), 


where u(x, y) = Re(g(a + iy)) and v(x, y) = Im(g(a + iy)) are defined in (17.31b) 
and (17.3lc). The density of g(Z) is, by definition, the joint density of the pair 
u(Re(Z), Im(Z)), v(Re(Z), Im(Z)). And the joint density of the pair (Re(Z), Im(Z)) 
is just the density of Z. Thus, if we could relate the joint density of the pair 
u(Re(Z), Im(Z)), v(Re(Z), Im(Z)) to the joint density of the pair (Re(Z), Im(Z)), 
then we could relate the density of g(Z) to the density of Z. 

To relate the joint density of the pair u(Re(Z), Im(Z)), v(Re(Z), Im(Z)) to the 
joint density of the pair (Re(Z),Im(Z)) we employ Theorem 17.3.4. To that end 
we need to compute the absolute value of the Jacobian determinant. This we do 
as follows: 


= |g'(x + iy), (17.35) 


where the first equality follows from the Cauchy-Riemann equations (17.32); the 
second from a direct calculation of the determinant of a 2 x 2 matrix; and where 
the last equality follows from (17.33). The theorem now follows from (17.35) and 
Theorem 17.3.4. 


17.4 Complex Random Vectors 


We say that Z = (Z,...,Z("))™ is a complex random vector on the probability 
space (Q, F, P) if it is a mapping from the outcome set 2 to C” such that the real 
vector 


(Re(Z), Im(Z"), ... Re(Z) tm(Z™))* 


comprising the real and imaginary parts of its components is a real random vector 
on (0, F, P), i.e., if each of the components of Z is a CRV. 


We say that the complex random vector Z = (Z,...,Z(™)™ and the complex 
random vector W = (W“),...,W(™))™ are of equal law (or have the same distri- 
bution) and write Z = W, if the real vector taking value in R?” whose components 
are the real and imaginary parts of the components of Z has the same distribution 
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as the analogous vector for W, ie., if for all 21,...,%n,Y1,---,Yn € R 
Pr[Re(Z™) < x,Im(Z™) <n,...,Re(Z™) < an,Im(Z) < yn] 


= Pr[Re(W) < a1,Im(W) < y,...,Re(W) < ep,Im(W™) < yn]. 


The expectation of a complex random vector is the vector consisting of the ex- 
pectation of each of its components. We say that a complex random vector is of 
finite variance if each of its components is a CRV of finite variance. 


17.4.1. The Covariance Matrix 


The discussion in Section 17.3.5 can be generalized to random complex vectors. 
The covariance matrix Kzz, of a finite-variance complex random n-vector Z is 
defined as the conjugate-symmetric n x n matrix 


Kzz, = E[(Z — E[Z])(Z— E[Z])'}. (17.36) 


Once again, this definition is not consistent with viewing the random complex 
vector as a vector of length 2n of real random variables. The latter would have a 
real symmetric 2n x 2n covariance matrix. 


The reader may wonder why we have chosen to define the covariance and the covari- 
ance matrix with the conjugation sign. Why not look at E[(Z — E[Z])(Z— E[Z])"]? 
The reason is that (17.36) is simply much more useful in applications. For example, 
for any deterministic a1,...,Q@, € C the variance of Deja a;Z; can be computed 
from Kzz, (using (17.25)) but not from E[(Z— E[Z])(Z— E[Z])"]. 


17.4.2. Proper Complex Random Vectors 


The notion of proper random variables extends to vectors: 


Definition 17.4.1 (Proper Complex Random Vector). A complex random vector Z 
is said to be proper if the following three conditions are all met: it is of zero mean; 
it is of finite variance; and 


E[ZZ"] =0. (17.37) 


An alternative definition can be given based on linear functionals: 


Proposition 17.4.2. The complex random n-vector Z is proper if, and only if, for 
every deterministic vector a € C” the CRV a'Z is proper. 


Proof. We begin by noting that Z is of zero mean if, and only if, a'Z is of zero 
mean for all a € C”. This can be seen from the relation 


E[a'Z| =a'E[Z], aeC”. (17.38) 


Indeed, (17.38) demonstrates that if Z is of zero mean then so is a'Z for every 
a €C”. Conversely, if a™Z is of zero mean for all a € C”, then, a fortiori, it must 
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also be of zero mean for the choice of a = E[Z]*, which yields that 0 = E[Z]' E[Z] 
and hence that E[Z] must be zero (because E[Z]' E[Z] is the sum of the squared 
magnitudes of the components of E[Z]). 


We next note that Z is of finite variance if, and only if, a'Z is of finite variance 
for every a € C”. The proof is not difficult and is omitted. 


We thus continue with the proof under the assumption that Z is of zero mean and 
of finite variance. We note that for any deterministic complex vector a € C” 


TE[ZZ"|a, aeC", (17.39) 


where the first equality follows by writing the square of a random variable as the 
product of the variable by itself; the second because the transpose of a scalar is 
the original scalar; the third by the transpose rule 


(AB)' =BIAT, (17.40) 


and the final equality because a@ is deterministic. 


From (17.39) it follows that if Z is proper, then so is a'Z for all a € C”. Actually, 
(17.39) also proves the reverse implication by substituting A = E/ZZ"] in the 
following fact from Matrix Theory: 


(aTAa =0, ae") > (A=0), A symmetric. (17.41) 


To prove this fact from Matrix Theory assume that A is symmetric, i.e., that 


at) — 99), 5,0€ {1,...,n}. (17.42) 

Let @ = eg where ey is all-zero except for its ¢-th component, which is one. The 
equality e} Aeg = 0 for every ¢ € {1,...,n} is equivalent to 

a€@) —0, fe {l,...,n}. (17.43) 


Next choose a =e; + e¢. The equality 
(ej + ec) "Ale; + ee) =0 
for every j,€ € {1,...,n} is then equivalent to 


aF) 4 G9) 4 4D 4669 =0, 5,0€ {1,...,n}. (17.44) 


Equations (17.42), (17.43), and (17.44) guarantee that the matrix A is all-zero. 


An important observation regarding complex random vectors is that a linearly- 
transformed proper vector is also proper: 
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Proposition 17.4.3 (Linear Transformation of a Proper Random Vector). [f the 
complez random n-vector Z is proper, then so is the complex random m-vector AZ 
for every deterministic m x n complex matria A. 


Proof. We leave it to the reader to verify that the hypothesis that Z is proper 
implies that AZ must be of zero mean and of finite variance. To show that AZ 
is proper, it thus remains to show that E[(AZ)(AZ)"] = 0. This we do by direct 
calculation: 


E[(AZ)(AZ)"] = E[AZZ'A™] 
= AE[ZZ"] AT 
— 0, 
where the first equality follows from the rule for the transpose of a product, namely, 


(AB)' = BTAT; the second because A is deterministic; and the last from the 
hypothesis that Z is proper, so E [ZZ"| =0. 


17.4.3, The Characteristic Function 


The definition we gave in Section 17.3.6 for the characteristic function of a CRV 
extends naturally to vectors: the characteristic function ®z: C” — C of a complex 
random n-vector Z is defined as 


@z(~) =E le | , wec”. 


Invoking the analogous result for tuples of real random variables we have: 


Theorem 17.4.4. The complex random vectors Z and W are of equal law if, and 
only if, their characteristic functions are identical: 


(z 2 Ww) S (®2(~) =dw(w), we or) (17.45) 


Corollary 17.4.5. The complex random n-vectors Z and W are of equal law if, and 
only if, for every deterministic vector a € C” the complex random variables a'Z 
and a'W are of equal law: 


(z Zz w) a (az 2o™W, ac c"). (17.46) 


Proof. The direction that needs proof is that equality in law of all linear combi- 
nations implies equality in law between the vectors. But this readily follows from 
the theorem because equality in law of the linear combinations implies that the 
law of w'tZ is equal to the law of wtW for every @ € C”. This in turn implies 
eiRe(w'Z) 2 eiRe(w@!W) from which, upon taking expectations, we obtain that Z 
and W have identical characteristic functions. Thus, by the theorem, they are 
equal in law. 
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17.4.4 Transforming Complex Random Vectors 


The change of density rule (17.34) can be generalized to analytic multi-variable 
mappings (Exercise 17.6). But here we shall only present a version of this result 
for linear mappings: 


Lemma 17.4.6 (Linearly Transforming Complex Random Vectors). Let the com- 
plex random n-vector W be given by 


W = AZ, 


where A is a nonsingular deterministic complezr nxn matrix, and where the complex 
random n-vector Z has the density fz(-). Then W is of density 


1 
fw(w) = gece f2(A '): wecC”. (17.47) 


Proof. The proof is based on viewing the complex n x n linear transformation 
from Z to W as a 2nxX2n real transformation, and on then applying Theorem 17.3.4. 


Stack the real parts of the components of Z on top of the imaginary parts in a real 
random 2n-vector S: 


zs 
S = (Re(Z™),...,Re(Z),Im(Z),...,Im(Z))) (17.48) 
Similarly, stack the real parts of the components of W on top of the imaginary 
parts in a real random 2n-vector T: 
* 


T= (Re(W%),...,Re(W™),Im(W),...,Im(W)) 


We can then express T as the result of multiplying the random vector S by a 
2n x 2n real matrix: 
_ (Re(A) —Im(A) 
ae: as Re(A) ) > 
where Re(A) and Im(A) denote the componentwise real and imaginary parts of A. 


The result will follow from Theorem 17.3.4 once we show that the absolute value 
of the Jacobian determinant of this transformation is |det A|?. Using elementary 
row and column operations we compute: 


det ean a) Sad @ a) 


Im(A) — Re(A) iA Re(A) 
A —Im(A) 
= det & A ) 
= (det A) (det A*) 
= |det Al’, 


where the first equality follows by the elementary column operations of multiplying 
the right columns by (—i) and adding the result to the left columns; the second 
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from the elementary row operations of multiplying the top rows by i and adding 
the result to the bottom rows; the third from the identity 


au & 5) (det Bee Dy 


and the last by noting that for any square matrix B 


det(B*) = (det B)*. 


17.5 Discrete-Time Complex Stochastic Processes 


Definition 12.2.1 of a real stochastic process extends to the complex case as follows. 


Definition 17.5.1 (Complex Stochastic Process). A complex stochastic pro- 
cess (CSP) (Z(t), te T) is a collection of complex random variables that are 
defined on a common probability space (Q,F,P) and that are indexed by some 
set T. 


A CSP (Z(t), t € T) is said to be centered if for each t € T the CRV Z(t) is of 
zero mean. Similarly, the CSP is said to be of finite variance if for each t € T the 
CRV Z(t) is of finite variance. A discrete-time CSP corresponds to the case where 
the index set T is the set of integers Z. Discrete-time complex stochastic processes 
are not very different from the real-valued ones we encountered in Chapter 13. 
Consequently, we shall present the main definitions and results succinctly with 
an emphasis on the issues where the complex and real processes differ. As in 
Chapter 13, when dealing with a discrete-time CSP we shall use subscripts to 
index the complex random variables and denote the process by (Zig ve Z) or, 
more succinctly, by (Zo)3 


A discrete-time CSP (Zins ve Z) is said to be stationary, or strict-sense sta- 
tionary, or strongly stationary if for every positive integer n and for every 
n,1 € Z, the joint distribution of the n-tuple (Z,,...Z+n-1) is identical to the 
joint distribution of the n-tuple (Z,,...,Z,/4n-1). This definition is essentially 
identical to the analogous definition for real processes (Definition 13.2.1). Similarly, 
Proposition 13.2.2 holds verbatim also for complex stochastic processes. Proposi- 
tion 13.2.3 also holds for complex stochastic processes with the slight modification 
that the deterministic coefficients a1,...,@, are now allowed to be arbitrary com- 
plex numbers: 


Proposition 17.5.2. A discrete-time CSP (Z.) is stationary if, and only if, for 


every NEN, all y,4,-.--,% € Z, and all ay,...,an €C, 
Sea ae (17.49) 
j=l j=l 


The definition of a wide-sense stationary CSP is very similar to the analogous 
definition for real processes (Definition 13.3.1). 
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Definition 17.5.3 (Wide-Sense Stationary Discrete-Time CSP). We say that 
a discrete-time CSP (7) is wide-sense stationary or weakly stationary or 
covariance stationary if the following three conditions all hold: 


1) For everyv € Z the CRV Z, is of finite variance. 
2) The mean of Z, does not depend on v. 
3) The expectation E[Z,Z*,] depends on v' and v only via their difference v—v': 


E[Z, 25] =E[Z49 Zen], ¥.',7 € Z. (17.50) 


Note the conjugation in (17.50). We do not require that E[Z,,Z,] be computable 
from v — v’; it may or may not be. Thus, we do not require that the matrix 


Gees ae ey) 
E[Im(Z,") Re(Z,)] E[Im(Z,) Im(Z,)] 


be computable from vy — v’. This matrix is, however, computable from v — v’ if the 
process is proper: 


Definition 17.5.4 (Proper CSP). A discrete-time CSP (Z,) is said to be proper 
if the following three conditions all hold: it is centered; it 1s of finite variance; and 


E[Z,Z,]=0, v,v' eZ. (17.51) 


Equivalently, a discrete-time CSP (Z,) is proper if, and only if, for every positive 
integer n and all ,...,v, € Z the complex random vector (Z,,,...,Zv,)! is 
proper. Equivalently, (Z,) is proper if, and only if, for every positive integer n, all 
Q1,---,Qn €C, and all 4,...,u, € Z 


S "a; Z,, is proper (17.52) 
j=l 

(Proposition 17.4.2). 

The alternative definition of WSS real processes in terms of the variance of linear 


functionals of the process (Proposition 13.3.3) requires little change: 


Proposition 17.5.5. A finite-variance discrete-time CSP (Z,) is WSS if, and only 
if, for everyn EN, all n,4,...,Un € Z, and all ay,...,Q@, EC 


S- a;Z,, and S- ajZy,+n have the same mean & variance. (17.53) 


j=1 j=1 


Proof. We begin by assuming that (Z,) is WSS and prove (17.53). The equality 
of expectations follows directly from the linearity of expectation and from the fact 
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that because (Z,) is WSS the mean of Z, does not depend on v. In proving the 
equality of the variances we use (17.25): 


var| Jo A249 — SS > aja%Cov[Zy,4n3 ee 
j=l 


gel j’=l 
n nm 
P 
= ) aja} Cov|Z,,, Zv,,| 
jel j’=l 


= Var] > a2, ‘ 
j=l 


where the second equality follows from the wide-sense stationarity of (Z.) and the 
last equality again from (17.25). 


We next turn to proving that (17.53) implies that (Z,,) is WSS. Choosing n = 1 and 
a, = 1 we obtain, by considering the equality of the means, that E[Z,] = E[Z,+,] 
for all 7 € Z, i.e., that the mean of the process is constant. And, by considering 
the equality of the variances, we obtain that the random variables (Z) all have 
the same variance 


Var[Z,] = Var[Zvan], v.07 © Z. (17.54) 


Choosing n = 2 and ay = a2 = 1 we obtain from the equality of the variances 
Var[Zy, + Zr.] = Var[Z.,4n + Zio+n] - (17.55) 
But, by (17.25) and (17.54), 
Var[Z,, + Z,.] = 2Var[Z1] + 2 Re(Cov[Z,,, Z,,]) (17.56) 
and similarly 
Var[Zu.4n + Zivot] = 2Var[Zi] + 2Re(Cov[Z,,4n, Zr24n1)- (17.57) 
By (17.55), (17.56), and (17.57) 
Re(Cov[Z,,49; Zin4n]) = Re(Cov[Z,,,Z..]),  1,U1,¥2 € Z. (17.58) 
We now repeat the argument with a; = 1 and a2 =i: 


Var[Z,, + i Zy,] = Var[Z,,] + Var[Z,,.] + 2Re(Cov[Z,, ,i Z,,]) 
= War[Z,] + 2Im(Cov[Z,,, Z,,]) 


and similarly 
Var[Zii4n +i Zv.4n] = 2Var[Z1] + 2Im(Cov[Z,, 4, Zvo+nl); 
so the equality of the variances implies 


Im(Cov[Z,, +n, Foxtel) = Im(Cov[Z,,, Zv2]), ,V1,¥2 € Z, 


which combines with (17.58) to prove Cov[Z,, 4, Zr.+n] = Cov[Z,, Zr]. 
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As with real processes, a comparison of Propositions 17.5.5 and 17.5.2 yields that 
any finite-variance stationary CSP is also WSS. The reverse is not true. 


Definition 17.5.6 (Autocovariance Function). We define the autocovariance func- 
tion Kzz: Z — C of a discrete-time WSS CSP (Zi) as* 


Kzz(n) & Cov[Zy4n; Zu] (17.59) 
=E| (Zan -ElZil) (2 -E[Zl)"], vez. 


By mimicking the derivations of (13.12) (taking into account the conjugate symme- 
try (17.18)) we obtain that the autocovariance function Kzz of every discrete-time 
WSS CSP (Z,) satisfies the conjugate-symmetry condition 


Kzz(—n) =Kzz(n), 7 €Z. (17.60) 


Similarly, by mimicking the derivation of (13.13) (ie., from the nonnegativity of 
the variance and from (17.25)), we obtain that the autocovariance function of such 
a process satisfies 


n 


Se Ss ayay, Kzz(v—v') >0, a4,---,an €C. (17.61) 


v=1v/=1 


In analogy to the real case, (17.60) and (17.61) fully characterize the possible 
autocovariance functions in the sense that any function K: Z — C satisfying 


K(-n) =K"(m), 7 EZ (17.62) 
and 
LS Ss ayas, K(v—v')>0, a4,...,@,€C (17.63) 
v=1v/=1 


is the autocovariance function of some discrete-time WSS CSP.° If K: Z > C 
satisfies (17.62) and (17.63), then we say that K(-) is a positive definite function 
from the integers to the complex field. 


Definition 13.16 of the power spectral density Szz requires no change. We 
require that Szz be integrable on the interval [—1/2, 1/2) and that 


1/2 
Ge) = i Wee en 4g, EZ. (17.64) 


Proposition 13.6.3 does require some alteration. Indeed, for complex stochastic 
processes the PSD need not be a symmetric function. However, the main result 
(that the PSD is real and nonnegative) remains true: 


4Some authors, e.g., (Grimmett and Stirzaker, 2001), define Kzz(m) as Cov[Zy, Zy4m]. Our 
definition follows (Doob, 1990). 

5TIn fact, it is the autocovariance function of some proper Gaussian stochastic process. Com- 
plex Gaussian random processes will be discussed in Chapter 24. 
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Proposition 17.5.7 (PSDs of Complex Processes Are Nonnegative). 


(i) If the discrete-time WSS CSP (Z,) is of PSD Szz, then 
Szz(9) = 0, (17.65) 


except possibly on a subset of the interval [—1/2,1/2) of Lebesgue measure 
zero. 


(wt) If a function S: [—1/2,1/2) > R is integrable and nonnegative, then there 
exists a proper discrete-time WSS CSP® (Z,) whose PSD Szz is given by 


Szz(@) = S(0), dE [—1/2,1/2). 


As in the real case, by possibly changing the value of Szz on the set of Lebesgue 
measure zero where (17.65) is violated, we can obtain a power spectral density that 
is nonnegative for all 0 € [—1/2,1/2). Consequently, we shall always assume that 
the PSD, if it exists, is nonnegative for all 0 € [—1/2,1/2). 


Proof. We begin with Part (i) where we need to prove the nonnegativity of the 
PSD. We shall only sketch the proof. We recommend reading the appendix through 
Theorem A.2.2 before reading this proof. 


Let Kzz denote the autocovariance function of the WSS CSP (Zaye Applying 
(17.61) with 
A) =a te ee 


and thus 


7 n 
Bel Seer. pa etal 


we obtain 


0< S- Ss ayax, Kzz(v — v’) 
25 ~ ei2n(v—v')0 Kzz(v - v') 


= (n— In|) e?™"° Kzz(m), 8 € [-1/2,1/2). 


n—-1 
0< S- (1 = a) e?™" K77(n) 
n 
n-1 
mee ay ae 
n 


n=—(n—1) 


= (kn_1* Szz)(9), 0 € [-1/2,1/2), 


6The process can be taken to be Gaussian; see Chapter 24. 
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where in the equality on the second line $zz(7) denotes the n-th Fourier Series 
Coefficient of Szz and we use (17.64); and in the subsequent equality on the third 
line k,, denotes the degree-n Fejér kernel (Definition A.1.3). 


We have thus established that k,_1 « Szz is nonnegative. The result now follows 
from Theorem A.2.2 which guarantees that 


lim |Szz(@) oa (kn * Szz)(0)| dd = 0. 


The proof of Part (ii) is very similar to the proof of the analogous result for real 
processes. As in (13.21), we define 


1/2 ; 
K(n) * | S(A) e279 dé, EZ, (17.66) 
-—1/2 


and we prove that this function satisfies (17.62) and (17.63). To prove (17.62) we 
compute 


1/2 


K(—n) = fe S(0) e~27(—® ag 


1/2 : 
= / S*(0) 7" do 


1/2 : * 
= ( i sole ea ww) 
—1/2 
=K*(n), n€Z, 


where the first equality follows from the definition of K(-) (17.66); the second 
because S(-) is, by assumption, real; the third because conjugating the integrand 
is equivalent to conjugating the integral; and the final equality again by (17.66). 
To prove (17.63) we mimic the derivation of (13.22) with the constants a1,...,Qn 
now being complex: 


qe ae non / 
S- > aya*,K(v — ) = 2% ax, i: : S(0) e7i2n(v—v')6 gag 
v=1v/=1 v=lv'=1 1/2 
- [i's o(Ly ye aya* * eT ian(y— vy i) do 
2 v=lv/=1 


/2 
~Lan® OLY S- Qy e i2nv8 aaa) dé 


v=l1v’/=1 
roles Qy — ® S- ays gee) : dé 
v=1 


evel 


! 
ae 


dé 


I 
Slee 
NS 
Lge 
bo 
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Proposition 13.6.6 needs very little alteration. We only need to drop the symmetry 
property: 


Proposition 17.5.8 (PSD when Kzz Is Absolutely Summable). If the autocovari- 
ance function Kzz of a discrete-time WSS CSP is absolutely summable, 1.e., 


S- |Kzz(n)| <0, (17.67) 
n=—0o 
then the function 
S(0)= S> Kzz(ne?""", 6 € [-1/2,1/2] (17.68) 


n=—CO 
is continuous, nonnegative, and satisfies 


1/2 : 
/ S(0)e~?*"° d@=Kzz(n), 1 €Z. (17.69) 
—1/2 


The Spectral Distribution Function that we encountered in Section 13.7 has a 
natural extension to discrete-time WSS CSPs: 


Theorem 17.5.9. 
(i) If (Z.) is a WSS CSP of autocovariance function Kzz, then 
Kzz(n) =Kzz(0) E[e?""°], 7 € Z, (17.70) 

for some random variable © taking value in the interval [—1/2,1/2). In 
the nontrivial case where Kzz(0) > 0 the distribution function of © is fully 
specified by Kzz. 

(ti) If © is any random variable taking value in [—1/2,1/2) and if a > 0, then 
there exists a proper discrete-time WSS CSP (Z.) whose autocovariance func- 
tion Kzz is given by 


Kzz(n) =aE[e?™"°], neZz (17.71) 


and whose variance is consequently given by Kzz(0) = a. 


Proof. See (Shiryaev, 1996, Chapter VI, Section § 1 Theorem 3), (Doob, 1990, 
Chapter X § 3 Theorem 3.2), or (Feller, 1971, Chapter XIX, Section 6, Theorem 3). 


Some authors refer to the mapping 6 +> Pr[O < 6] as the spectral distribution func- 
tion of (Z,), but others refer to 8 ++ Kzz(0) Pr[© < 4] as the spectral distribution 
function. The latter is more common. 
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17.6 On the Eigenvalues of Large Toeplitz Matrices 


Although it will not be used in this book, we cannot resist stating the following 
classic result, which is sometimes called “Szegd’s Theorem.” Let the function 
s: [—1/2,1/2] — [0,00) be Lebesgue integrable. Define 


1/2 ; 
C= / s(0)e 27"? do, ne Z. (17.72) 
—1/2 


(In some applications s(-) is the PSD of a discrete-time real or complex stochastic 
process and c, is the value of the corresponding autocovariance function at 7.) 


The n X n matrix 


Co C1 see En—-1 
C_j{ Co see Cn—2 
C-_n+1 soe soe Co 


is positive semidefinite and conjugate-symmetric. Consequently, is has n nonneg- 
ative eigenvalues (counting multiplicity), which we denote by 


NY NG) ease NM, (17.73) 


As n increases (with s(-) fixed), the number of eigenvalues increases. It turns out 
that we can say something quite precise about the distribution of these eigenvalues. 


Theorem 17.6.1. Let s: [—1/2,1/2] — [0,00) be integrable, and let \ be as in 


(17.73). Let g: [0,c0o) — R be a continuous function such that the limit lime... ne 


exists and is finite. Then 


no nN 
j=l 


; 1/2 
lim ~S* g(A®) =| g(s(0)) dd. (17.74) 


Proof. For a proof of a more general statement of this theorem see (Simon, 2005, 
Chapter 2, Section 7, Theorem 2.7.13). 


17.7 Exercises 


Exercise 17.1 (The Distribution of Re(Z) and |Z|). Let the CRV Z be uniformly dis- 
tributed over the unit disc {z € C: |z| < 1}. 


(i) What is the density of its real part Re(Z)? 


(ii) What is the density of its magnitude |Z|? 


Exercise 17.2 (The Density of 7”). Let Z be a CRV of density fz(-). Express the density 
of Z? in terms of fz(-). 
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Exercise 17.3 (The Conjugate of a Proper CRV). Must the complex conjugate of a proper 
CRV be proper? 


Exercise 17.4 (Product of Proper CRVs). Show that the product of independent proper 
complex random variables is proper. Is the assumption of independence essential? 


Exercise 17.5 (Sums of Proper CRVs). Show that the sum of independent proper complex 
random variables is proper. Is the assumption of independence essential? 


Exercise 17.6 (Transforming Complex Random Vectors). Let Z be a complex n-vector 
of PDF fz(-). Let W = g(Z), where g: D > R is a one-to-one function from an open 
subset D of C” to R CC”. Let the mappings u,v: R?” > R” be defined for x,y € R” 
as 
u: (x,y) + Re(g(x+iy)) and v: (x,y) +> Im(g(x +iy)). 
Assume that g is differentiable in D in the sense that for all j,@ € {1,...,n} the partial 
derivatives 
du (x, y) du (x, y) dv (x, y) dv (x, y) 
Ox 2 Oy , Ax i Oy 


exist and are continuous in D, and that they satisfy 


bu (x,y) : Av (x,y) du (x, y) Av (x,y) 


ar dyO and ayO ar® 


where a) denotes the j-th component of the vector a. Further assume that the determi- 
nant of the Jacobian matrix 


du Gay) Oo Gy) du (x,y), dv (x,y) 
da eM eT Ge) 
det g'(z) = det Wee 
du™ (x,y) | dv (x,y) du (x,y) _ dv (x,y) 
Baa toa) Bete) Bites) 


is at no point in D zero. Show that the density fw(-) of W is given by 


fov(w) = 228) 


— -Hwe Rt}. 
[det-g' (2) |, gam tS 


Exercise 17.7 (The Cauchy-Schwarz Inequality Revisited). Let (Z:) be a discrete-time 
WSS CSP. Show that (17.61) implies 


|\Cov[Ze, Ze']| < Var[Zi],  ¢, 2’ € Z. 


Exercise 17.8 (On the Autocovariance Function of a Discrete-Time CSP). Show that 
if Kzz is the autocovariance function of a discrete-time WSS CSP, then for every n € N, 
the matrix 


Kzz(0) Kzz(1) saris, Kzz(n a 1) 
Kzz (-1) Kzz (0) eer Kzz(n = 2) 
Kone th Keene a. SO) 


is positive semidefinite. 
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Exercise 17.9 (Reversing the Direction of Time). Let Kzz be the autocovariance function 
of some discrete-time WSS CSP (Z.). For every v € Z define Y, = Z_,. Show that the 
time-reversed CSP (Y) is also a WSS CSP, and express its autocovariance function Kyy 
in terms of Kzz. 


Exercise 17.10 (The Sum of Autocovariance Functions). Show that the sum of the 
autocovariance functions of two discrete-time WSS complex stochastic processes is the 
autocovariance function of some discrete-time WSS CSP. 


Exercise 17.11 (The Real Part of an Autocovariance Function). Let Kzz be the au- 
tocovariance function of some discrete-time WSS CSP (Z.). Show that the mapping 
mt Re(Kzz (m)) is the autocovariance function of some real SP. Is this also true for the 
mapping m +> Im(Kzz(m))? 


Exercise 17.12 (Rotating a WSS CSP). Let (Z,) be a zero-mean WSS discrete-time CSP, 
and let a € C be fixed. Define the new CSP (We) as We = al Zo for every ¢ € Z. 


(i) Show that if |o| = 1 then (We) is WSS. Compute its autocovariance function. 


(ii) Does your answer change if a is not of unit magnitude? 


Chapter 18 


Energy, Power, and PSD in QAM 


18.1 Introduction 


The calculations of the power and of the operational power spectral density in 
QAM are not just repetitions of the analogous PAM calculations with complex 
notation. They contain two new elements that we shall try to highlight. The 
first is the relationship between the power (as opposed to energy) in passband and 
baseband, and the second is the fact that the energy and power in transmitting 
the complex symbols {C?} are only related to expectations of the form E[CeC7,]; 
they are uninfluenced by those of the form E[C¢C¢]. 


The signal (X(t), t € R) (or X for short) that we consider is given by 
X(t) = 2Re(Xpp(t)e?""), teER, (18.1) 


where 
Xpp(t) =A > Crg(t— ET), tER. (18.2) 
L 


Here A > 0 is real; the symbols {C?} are complex random variables; the pulse 
shape g is an integrable complex function that is bandlimited to W/2 Hz; T; is 
positive; and f. > W/2. The range of the summation will depend on the modes 
we discuss. 

Our focus in this chapter is on X’s energy, power, and operational PSD. These 
quantities are studied in Sections 18.2—18.4, albeit without all the fine mathemat- 
ical details. Those are provided in Sections 18.5 & 18.6, which are recommended 
for the more mathematical readers. The definition of the operational PSD of com- 
plex stochastic processes is very similar to the one of real stochastic processes 
(Definition 15.3.1). It is given in Section 18.4 (Definition 18.4.1). 


18.2. The Energy in QAM 


As in our treatment in Chapter 14 of PAM, we begin with an analysis of the energy 
in transmitting K IID random bits D,,..., Dx. We assume that the data bits 
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are mapped to N complex symbols C,...,CN using a (K, N) binary-to-complex 
block-encoder 
enc: {0,1}* + CN (18.3) 


of rate 


K bit 
N complex symbol | 
The transmitted signal is then: 


X(t) = 2Re(Xpp(t) e?”“*) (18.4) 
N 
=2Re (oa sts), teR, (18.5) 
é=1 
where the baseband representation of the transmitted signal is 
N 
Xpp(t)=AS~Crg(t- £15), teER. (18.6) 
é=1 


Our interest is in the energy E in X, which is defined by 
ESE / X?(t) a : (18.7) 


Our assumption that the pulse shape g is bandlimited to W/2 Hz implies that 
for every realization of the symbols {C?}, the signal Xpp(-) is also bandlimited 
to W/2 Hz. And since we assume that f. > W/2, it follows from Theorem 7.6.10 
that the energy in the passband signal X(-) is twice the energy in its baseband 
representation Xpp(-), ie., 


E=2e| f |Xen(0)|" at). (18.8) 


We can thus compute the energy in X(-) by computing the energy in Xpp(-) and 
doubling the result. The energy of the baseband signal can be computed in much 
the same way that the energy was computed in Section 14.2 for PAM. The only 
difference is that the baseband signal is now complex: 


E ao a 
] au 


a Oe 
(Ssccae-3)(aSecemt—eta) | 


l=1 = 


N 
AD crate- a1 


f=1 


Co 


E 
N N 
=4y°>- E(c.ce| f g(t — Me) g(t — CT) at 


l=1 £'=1 oe 


N N 
=A?7S°S5 E[CrCh] Ree ((€ — 2)T3), (18.9) 
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where Rgg is the self-similarity function of the pulse shape g (Definition 11.2.1), 
Le., 


Rest) = is gt+r)g*(t)dt, 7rER. (18.10) 


—Co 


This expression for the energy in Xpp(-) is greatly simplified if the symbols {Cy} 
are of zero mean and uncorrelated: 


foe) N 
E| [_[xaa(o/?ar] = a? IS DE (CA) 
wes é=1 
(ElCeC#] = E[ICe?] Ite = ¢'}, ,/€ {1,...,N}), (18.11) 


or if the time shifts of the pulse shape by integer multiples of T; are orthonormal 


E Lf Xna(d | = A? o E[|Ce|?] , 
—oo f=1 


Cs g(t — eTs)g*(t — UT.) dt = 2 = 2}, GE € {1..N}). (18.12) 


—Co 


Since g is an integrable function that is bandlimited to W/2 Hz, it is also energy- 
limited (Note 6.4.12). Consequently, by Proposition 11.2.2 (iv), we can express 
the self-similarity function Rgg in (18.9) as the Inverse Fourier Transform of the 
mapping f +> |9(f)|?: 


Reg (T) = i: aN est af, TER. (18.13) 


With this representation of Rgg we obtain from (18.9) an equivalent representation 
of the energy as 


co N N 


E| [xen Par] =a? fo lcci erro" 


OO p= f/=1 


g(F)P af. (18.14) 


Using (18.8), (18.9), and (18.14) we obtain: 


Theorem 18.2.1 (Energy in QAM). Assume that A > 0, that T; > 0, that g: R > 
C is an integrable signal that is bandlimited to W/2 Hz, and that f. > W/2. Then 
the energy E in the QAM signal X(-) of (18.5) is given by 


N oN 
E = 2A7S° S- E[CeCh] Reg ((f — £)Ts) (18.15) 
é=1 &'=1 
co N N 
= 2a? | S> So E[CeCF] e?7F C—O la P)/F af, (18.16) 
SOO PET M1 
whenever all the complex random variables C1,...,CN are of finite variance 


E[|Cz?] <0, @=1,...,N. (18.17) 
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In analogy to PAM, we define the energy per bit E, by 


E 
E, = iz (18.18) 


and the energy per complex symbol E, by 


E 
Be 
E25. (18.19) 
Using Theorem 18.2.1, we obtain 
2 N N 
E.= Co SS ElCeCh] Reg (2 - £)Ts) (18.20) 
l=1 #/=1 
9 co N N 
= ae | SoS ElCeCp] PFC -98 [gf P df. (18.21) 
TOO p=) b=1 


Notice that, as promised, only terms of the form E[C,C7,] influence the energy; 
terms of the form E[C¢C¢] do not appear in this analysis. 


18.3. The Power in QAM 


In order to discuss the power in QAM we must consider the transmission of an 
infinite sequence of complex symbols (C2). To guarantee convergence, we shall 
assume that the pulse shape g—in addition to being an integrable signal that is 
bandlimited to W/2 Hz—also satisfies the decay condition 


——_— 
T+ h/t 


Ig(t) R (18.22) 
for some a, > 0. Also, we shall only consider the transmission of bi-infinite 
sequences (Cy) that are bounded in the sense that there exists some 7 > 0 such 
that every realization of (C;) satisfies 


IC) <7, ¢eZ. (18.23) 


As for PAM, we shall treat three different scenarios for the generation of (C2). In 
the first, we simply ignore the mechanism by which the sequence (Ce) is generated 
and assume that it forms a wide-sense stationary complex stochastic process. In 
the second, we assume bi-infinite block encoding. And in the third we relax the 
statistical assumptions and consider the case where the time shifts of g by integer 
multiples of T,; are orthonormal. In all these cases the transmitted waveform is 
given by 

X(t) =2Re(Xpp(t)e?"""), teR, (18.24) 


where 


Xpp(t)=A S> Crg(t— £1), tER. (18.25) 


L=—0o 
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It is tempting to derive the power in X(-) by using the complex version of the 
PAM results of Section 14.5 to compute the power in Xpp(-) and then doubling 
the result. This turns out to be a valid approach, but its justification requires some 
work. The difficulty is that the powers are defined as 


ee eff xe(mar 
1m —— 
T-00 2T ae 
and 
lim =E "|x (t)|” at 
Tosco 2T ae BB ’ 


and—Theorem 7.6.10 notwithstanding— 


T T 
nef xt af # 2ae| [pronto a (18.26) 


The reason we cannot claim equality in (18.26) is that t — X(t) I{|t| < T} is not 
bandlimited around f,, so Theorem 7.6.10, which relates energies in passband and 
baseband, is not applicable. Nevertheless, it turns out that the limits as T— oo of 
the RHS and the LHS of (18.26) do agree: 


1 
lim —E 
fe 2T 


if 1 i 2 
[0 af = 2m, soe] [soot a) (18.27) 


Thus, the power in a QAM signal is, indeed, twice the power in its baseband 
representation. This is stated more precisely in Theorem 18.5.2 and is proved in 
Section 18.5. With the aid of (18.27) we can now readily compute the power in 
QAM. 


18.3.1 (Cy) Is Zero-Mean and WSS 


We next ignore the mechanism by which the symbols (Ce) are generated and merely 
assume that they form a zero-mean WSS discrete-time CSP of autocovariance 
function Koo: 

E[C;]=0, eZ, (18.28a) 


E[CeimC7| = Koc(m) . m, & € Z. (18.28b) 


The calculation of the RHS of (18.27) is very similar to the analogous computation 
in Section 14.5.1 for PAM. The only difference is that here Xpp(-) is complex. As 
in Section 14.5.1, we begin by computing the energy in a length-T; interval: 


T+Ts 2 
e| | |Xppn(t)| a 
T+Ts 2 
=A? iy | | dt 


s Ce iG _ eT) 


L=—0o 
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T+Ts co oo 
=A c| ye S- CrCp 9(t — £15) 9° (t — Ts) | dt 


L=—co t!=—00 


T+Ts © id) 
=a | SSS ElCeCh] 9(t — £1.) 9° (t - 'T,) dt 


l=—co L!=—0co 
T+Ts oo oo 
=A i: » oy E[Cv4mCp] g(t — (¢ + m)Ts) 9*(t — CTs) dt 
‘3 Mm=—co l’=—0o 
T+Ts ie) le) 
=a | S> Keo(m) $5 g(t- (+ m)Ts) 9° (t — UT.) de 
F mM>=— Co L!=—0o 
28. oo T+T,—0'Ts 
= AZ S- Koc (m) ba i g(t! ad mTs) 9 (t ) dt’ 
m=—oo L!=—00 T—£'Ts 
=A? S* Kea(m) i, g(t’) g(t! — mT.) dt’ 
=A? SY) Keo(m) Reg (Ts), (18.29) 


where we have substituted ¢ + m for ¢ (fourth equality) and ?¢’ for t — é’/Ts (sixth 
equality). 


As in the analogous analysis for real PAM signals, we lower-bound the energy of 
Xpp(-) in the interval [—T,+T] by 


2) el [santa ae 
E] | [santa 


so, by the Sandwich Theorem, 


and upper-bound it by 


1 +T 5 1 T+Ts 2 
jim = Ee |Xpp(t)| a =a e| | | Xpp(t)| a ; (18.30) 
It thus follows from (18.30) and (18.29) that the power Ppp in Xpp(-) is 
AO 
Pap = > De, Koc (m) Rge(mTs) (18.31) 
=X PS Kectmye Bom" gnPay (18.32) 
T. ee CC g ’ 


where the second equality follows from (18.13). 


Since the power in passband is twice the power in baseband, we conclude: 
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Theorem 18.3.1. Let the QAM SP (X(t)) be given by (18.24) & (18.25), where 
A, Ts, g, W, and f. are as in Theorem 18.2.1. Further assume that g satisfies the 
decay condition (18.22) and that the discrete-time SP (C2) is bounded in the sense 
of (18.23). If (Ce) satisfies (18.28), then (X(t)) is a@ measurable SP, 


lim me fa X(t ) dt 
T00 2 


Co 


S© Keo(m) Rég(mTs), (18.33) 


m=—oo 


a8 


and 


(FPP af. 


(18.34) 


Proof. Follows by combining (18.27) (Theorem 18.5.2) and Theorem 14.6.4 (which 
extends to the case where the pulse shape and the symbols are complex). 


18.3.2 Bi-Infinite Block-Mode 


The second scenario we consider is when (C2) is generated, as in Section 14.5.2, by 
applying a binary-to-complex block-encoder enc: {0,1}* — CN to bi-infinite IID 
random bits (Dj). As in Section 14.5.2, we assume that the encoder, when fed IID 
random bits, produces symbols of zero mean. 


By extending the results of Section 14.5.2 to complex pulse shapes and complex 
symbols, we obtain that the power in Xpp(-) is given by: 


N 2 
1 
PaaS nie AL aste—a | (18.35) 
ae oo N N 
xi S25 ElCeCp] e? FC af) |? df. (18.36) 
OO f=1 0 =1 


Using the relationship between power in baseband and passband (18.27) and using 
the definitions of E (18.8) and of E; (18.19), we obtain: 


Theorem 18.3.2. Under the assumptions of Theorem 18.8.1, if the symbols (Ce) 
are generated from IID random bits (D;) in bi-infinite block-mode using the encoder 
enc(-), where enc(-) produces zero-mean symbols when fed IID random bits, then 
(X(t)) is a measurable SP, and 


Es 
lim = ef" K( wa aa (18.37) 


where the energy per symbol E, is defined in (18.19) and is given by (18.20) or 
(18.21). 
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Proof. Follows from Theorem 18.5.2 and by noting that Theorem 14.6.5 also ex- 
tends to the case where the pulse shape and the symbols are complex. 


18.3.3. Time Shifts of Pulse Shape Are Orthonormal 


We finally address the third scenario where the time shifts of the pulse shape by 
integer multiples of T; are orthonormal. This situation is very prevalent in Digital 
Communications and allows for significant simplifications. In this setting we denote 
the pulse shape by ¢(-) and state the orthonormality as 


if o(t — 01, o*(t UT, dt == 0}, 6 eZ. (18.38) 
The transmitted signal (X(t), t € R) is thus given as in (18.24) but with 


Xpp(t) =A x C,o(t—£1,), tER, (18.39) 


L=—0o 


where we assume that the discrete-time CSP (C2) satisfies the boundedness con- 
dition (18.23) and that the complex pulse shape ¢(-) satisfies the orthogonality 
condition (18.38) and the decay condition 


B 
60 < a 


teR, (18.40) 


for some a, 3 > 0. 
Computing the power in (Xpp(t), te R) using Theorem 14.5.2, which easily 
extends to the complex case, we obtain from (18.27): 


Theorem 18.3.3. Let the SP (X(t), t€ R) be given by 


X(t) =2Re (a SS” Ce o(t — £12) oe) teR, (18.41) 


L=—0o 


where A > 0; Ts > 0; the pulse shape @: R — C is an integrable function that is 
bandlimited to W/2 Hz, is Borel measurable, satisfies the orthogonality condition 
(18.38), and satisfies the decay condition (18.40); the carrier frequency f. satisfies 
fc > W/2 > 0; and where the CSP (C¢) satisfies the boundedness condition (18.23). 


Then (X (2d), te R) is a measurable stochastic process, and 


5 lim Dy E[|Ce|? (18.42) 
= he fim src oe 


whenever the limit on the RHS exists. 
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18.4 The Operational PSD of QAM Signals 


We shall compute the operational PSD of the QAM signal (X(t), ¢ € R) (18.24) by 
relating it to the operational PSD of the complex signal (Xpp(t), t € R) (18.25) 
and by then computing the operational PSD of the latter using techniques similar 
to the ones we employed in Chapter 15 in our study of the operational PSD of real 
PAM signals. But first we must define the operational PSD of complex stochastic 
processes. The definition is very similar to that for real stochastic processes (Defi- 
nition 15.3.1), but there are two issues to note. The first is that we do not require 
that the operational PSD be a symmetric function, and the second is that we allow 
for filters of complex impulse response. 


Definition 18.4.1 (Operational PSD of a CSP). We say that a CSP (Z(t), t € R) 
is of operational power spectral density Szz if (Z(t), te R) is a measurable 
CSP;' the mapping Szz: R — R is integrable; and for every integrable complez- 
valued function h: R — C the average power of the convolution of (Z(2), te R) 
and h is given by 


Power in Zxh = ee Sza(f) |A(f)|? df. (18.43) 


By Lemma 15.3.2 (i) the PSD is unique: 

Note 18.4.2 (The Operational PSD Is Unique). The operational PSD of a CSP 
is unique in the sense that if a CSP is of two different operational power spectral 
densities, then the two must be indistinguishable. 


The relationship between the operational PSD of the real QAM signal (X(t) 
(18.24) and of the CSP (Xpp(t)) (18.25) turns out to be very simple. Indeed, 
subject to the conditions that are made precise in Theorem 18.6.6, if the baseband 
CSP (Xpp(t)) is of operational PSD Spp, then the real QAM SP (X(t)) is of 
operational PSD Syx, where 


Sxx(f) =Spa(Ifl- fc), feER. (18.44) 


This result is proved in Section 18.6 and relies heavily on the fact that g is band- 
limited to W/2 Hz and that fc. > W/2. Here we shall only derive it heuristically 
and then see how to apply it. 


Recalling the definition of the operational PSD of a real SP (Definition 15.3.1), we 
note that in order to derive (18.44) we need to show that its RHS is an integrable 
symmetric function and that 


Power in X*h = ‘i |A(F)|" Spe (fl — fe) af, (18.45) 


1A complex stochastic processes is said to be measurable if its real and imaginary parts are 
measurable real stochastic processes. 
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whenever h: R — R is integrable. The integrability of f + Spr(|f| — fc) follows 
directly from the integrability of Spp(-). The symmetry is obvious because the RHS 
of (18.44) depends on f only via |f|. Our plan for computing the power in KX xh 
is to first use the results of Section 7.6.7 to express the baseband representation 
of X xh in the form Xpp * hpp, where hpp is the baseband representation of the 
result of passing h through a unit-gain bandpass filter of bandwidth W around 
the carrier frequency f.. Using the relationship between power in passband and 
baseband, this will allow us to express the power in X xh as twice the power in 
Xpp * hyp. Expressing the power in the latter using the operational PSD Sgp(-) 
of Xpp will allow us to complete the calculation of the power in X xh. 


Before executing this plan, we pause here to heuristically argue that, loosely speak- 
ing, the condition that g is bandlimited to W/2 Hz implies that we may assume 
that 


Sen(f)=0, |fl> (18.46) 


For a precise statement of this result, see Proposition 18.6.3 in Section 18.6.2. The 
intuition behind this statement is that, since g is bandlimited to W/2 Hz, in some 
loose sense, all the power of the signal Xpp is contained in the band |f| < W/2. 
To heuristically justify (18.46), we shall show that if Sgp(-) is an operational PSD 
for (Xpp(t)), then so is the mapping f +> Spp(f)I{|f| < W/2}. This follows by 
noting that for every h: R — C in Ly 


Power in Xpp *h = Power in (t _ AS Cr g(t — cT.)) xh 


LEZ 
= Power in th AS Cy (g * h)(t — £Ts) 
LEZ 
= Power in t+ AS” Cy ((g * LPF 2) * h) (t — £Ts) 
LEZ 


= Power in t+ AS” Cy (gx (hx LPFw72)) (t — Ts) 
LEZ 
= Power in (t HAS" Cat - cT.)) x (h*LPFy/2) 
LEZ 


7 [. Spa(f) |A(f) If] < W/2}|" df 


=f (Smal A) Hifi < W/2}) ACP ay. 
from which the result follows from the uniqueness (to within indistinguishability) 
of the operational PSD (Note 18.4.2). Here the first equality follows from the 
definition of Xgp (18.25); the second because convolving a PAM signal of pulse 
shape g (in our case complex) with h is tantamount to replacing the pulse shape g 
with the new pulse shape g xh (see the derivation of (15.16) in Section 15.4 which 
extends verbatim to the complex case); the third because, by assumption, g is 
bandlimited to W/2 Hz; the fourth by the associativity of convolution (see Theo- 
rem 5.6.1, which, strictly speaking, is not applicable here because LPFw/z is not 
integrable); the fifth because replacing the pulse shape g by g x (h * LPF w2) is 
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tantamount to convolving the PAM signal with (hx LPFyjz); the sixth from our 
assumption that Spp(-) is an operational PSD for Xpp (and by ignoring the fact 
that hx LPFy2 need not be integrable); and the seventh by trivial algebra. 


Having established (18.46), we are now ready to compute the power in X xh. 
Using the results of Section 7.6.7 we obtain that for every integrable h: R — R, 
the baseband representation of X xh is given by Xgp*hpp where hpp: R — C is 
the baseband representation of the result of passing h through a unit-gain bandpass 
filter of bandwidth W around the carrier frequency f,: 


ipa (f) =A + fe) UIf|< W/2}, feR. (18.47) 
And since the power in passband is twice the power in baseband, we conclude that 


Power in X *h = 2 Power in Xpp x hpp 


=2/ Spn(f) [Mon (f)| af 


= 2p Saa(f) [a(f + fo)| Hf] < W/2} af 


= a Spp(f) a(S + fo)| df 
ie San(f — fe) [MAI af 


= iS Spa(f — fe) [ACA af + / ” San(F— fe) |(-AlP af 


= [Seni sob P ae +f Sual-1— sa Par 


= i (Sua (f — fe) + Spn(—f — fe)) [ACN af 


2 ig Seal ae 


where the first equality follows because the power in passband is twice the power 
in baseband; the second because Xpp is of operational PSD Sgp(-); the third by 
(18.47); the fourth by (18.46); the fifth by changing the integration variable to 
f = f + fe; the sixth because h is real so its Fourier Transform must be conjugate- 
symmetric; the seventh by changing the integration variable in the second integral 
to f’ & —f; the eighth by the linearity of integration; and the final equality by 
(18.46) and the assumption that f, > W/2. This establishes (18.45) and thus 


concludes the proof of (18.44). 


We next apply (18.44) to calculate the operational PSD of QAM in two scenarios: 
when the complex symbols (C2) form a bounded, zero-mean, WSS, CSP and when 
they are generated in bi-infinite block-mode. 
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18.4.1 (Cy) Zero-Mean WSS and Bounded 


We next use (18.44) to derive the operational PSD of QAM when the discrete-time 
CSP (C) is of zero mean and of autocovariance function Koc; see (18.28). To use 
(18.44) we first need to compute the operational PSD of the CSP Xgp. This is 
straightforward. As in Section 15.4.2, we note that Xpp x h has the same form as 
(18.25) with the pulse shape g replaced by g*h. Consequently, by substituting 
the FT of g*h for the FT of g in (18.32),? we obtain that 


Co 


2 
Power in Xpp xh = =| a(f)|? |ACA)P df (18.48) 


oy Koa(m) e 2a fmTs 


So m=—oco 


and the operational PSD of Xgp is thus 


Co 


> Koo (m) e 2a fms 


m=—Cco 


2 
Spa(f) = > AP, FER. (18.49) 


This is the complex analog of (15.21). From (18.49) and (18.44) we now obtain: 


Theorem 18.4.3. Under the assumptions of Theorem 18.3.1, the operational PSD 
of the QAM signal (X(t), t € R) is given by 


Co 


A? 
Sex(f) = 5- D2 Keo(m) e?rltI-fomt 


m>=—Co 


> f ER. | (18.50) 


a(If| — fe) 


Proof. The justification of (18.44) is in Theorem 18.6.6. A formal derivation of 
the operational PSD of (Xpp (t), tE R) can be found in Section 18.6.5. We draw 
the reader’s attention to the fact that the proof that we gave for the real case in 
Section 15.5 is not directly applicable to the complex case because that proof relied 
on Theorem 25.14.1 (Wiener-Khinchin), which we prove in Section 25.14 only for 
real WSS stochastic processes.® 


Figure 18.1 depicts the relationship between the pulse shape g and the operational 
PSD of the QAM signal for the case where Kcc(m) = I{m = 0} for every m € Z. 


18.4.2. The Operational PSD of QAM in Bi-Infinite Block-Mode 


The operational PSD of QAM in bi-infinite block-mode can also be computed 
using (18.44). All we need is the operational PSD of (Xgp(t)), which can be 
computed from (18.36) as follows. As in Section 15.4.2, we note that Xgp xh has 
the same form as (18.25) with the pulse shape g replaced by g* h. Consequently, 


?We are ignoring here the fact that g * h need not satisfy the required decay condition. 
3The extension to the complex case is not as trivial as one might think because the real and 
imaginary parts of a WSS complex SP need not be WSS. 
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Figure 18.1: The relationship between the Fourier Transform of the pulse shape 
g(-) and the operational PSD of a QAM signal. The symbols (Cz) are assumed to 
be of zero mean and uncorrelated. 


by substituting the FT of g xh for the FT of g in (18.36), we obtain that 


Power in Xpp xh 


co A2 N N 
af ( TL ElCecp] ease -O7 


an’) JAP af, (18.51) 


Ree 
Spa(f) = > SS E[Cecp]e®™F C—O af), FER. (18.52) 


This is the complex analog of (15.23). (But note that, in our present case, Spp(-) 
need not be a symmetric function.) From (18.52) and (18.44) we now obtain: 
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Theorem 18.4.4 (Operational PSD of QAM in Bi-Infinite Block-Mode). Under 
the assumptions of Theorem 18.3.2, the operational PSD Sxx of the QAM signal 
(X(t), ER) is given for every f € R by 


2 N N 


A * i27 i _p ; 
Sxx(f) = NT S> So E[CeOp] 2m FIFO 


ce aa 


afl—f)]’. (18.53) 


Proof. The justification of (18.44) is in Theorem 18.6.6, and a formal derivation 
of the operational PSD of (Xpp(t)) is given in Section 18.6.5. 


18.5 A Formal Account of Power in Passband and Baseband 


In this section we formulate conditions under which (18.27) holds, i-e., under which 
the power in passband is twice the power in baseband. We first extend the Triangle 
Inequality (4.14) to stochastic processes. 


Proposition 18.5.1 (Triangle Inequality for Stochastic Processes). Let (X(t)) 
and (Y (t)) be (real or complex) measurable stochastic processes, and let a < b be 
arbitrary real numbers. Suppose further that 


ef f xcof at] ef f Wreo/ad < 00. (18.54) 


Then 


Je[frrerta] -fe{frrora 


2 e| [xe +veoltat 2 


Jef ixerrad + Jef [reared : (18.55) 


This also holds when a is replaced with —co and/or b is replaced with +00. 


2 


Proof. Replace all integrals in the proof of (4.14) with expectations of integrals. 


We can now state the main result of this section relating power in passband and 
baseband. 
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Theorem 18.5.2. Let T;, g, W, and f. be as in Theorem 18.2.1 and, addition- 
ally, assume that g satisfies the decay condition (18.22) and that the CSP (C2) is 
bounded in the sense of (18.23). Then the condition 


lim — ref 
T00 2 Ti 


is equivalent to the condition 


jim 5 lf (2Re( Scr a( t — &T,) el@r fet )y' 4 = 2P. (18.57) 


LEZ 


>> Ce g(t — £3) 


LEZ 


“al =P (18.56) 


The rest of this section is dedicated to proving this theorem. To simplify the 
notation we begin by showing that it suffices to prove the result for the case where 
Ts = 1. If Ts; > 0 is not necessarily equal to 1, then we define for every t € R, 


g(t) _ g(tTs), 
W=WI,, 
fe = fels, 


and note that g is bandlimited to W/2 Hz if, and only if, g is bandlimited to WT, /2 
Hz; that 


(fe > W/2) = (fe = W/2); 
and that g satisfies the decay condition (18.22) if, and only if, 


|< —” 


= Pe 
<1 


By defining r 4 t/T, we obtain that 


T/Ts 
C, a(t ‘ape: sf 
al. ie. aT T/T. 


LEZ 
so the power in the mapping t + 5>C; g(t — ¢T;) is the same as in the mapping 
TH Cr G(7r — 2). Similarly, 


(eRe Scent (t — £T.) crest) ) ae 
ah ee 
= sae (2Re( SN evatr ~ ele) dr 


so the power in the mapping t + 2 Re(>>, Cr 9(t— £T,) e?* 4") is the same as in the 
mapping T ++ 2 Re(}>, Cy g(t— £) ein fer) Thus, if we establish that the inequality 
fi. > W/2 implies that the power in the baseband signal 7 + > C? g(7 — £) is equal 
to half the power in 7 ++ 2Re(>, Cr g(t — £) ean fer), then it will also follow that 


S > Cr Gi (r—£ 


LEZ 


dr 
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the inequality f. > W/2 implies that the power in t+ >> C,9(t — fT) is equal to 
half the power in t+ Re(}>, Cr 9(t — £T3) e?*/«*). 


Having established that it suffices to prove the theorem for T; = 1, we assume for 
the remainder of this section that T, = 1, so the decay condition (18.22) can be 
rewritten as 

B 


As in the proof of Theorem 14.5.2, we shall simplify notation and assume that—in 
calculating power as the limiting ratio of the energy in the interval [—T, T] to the 
length of the interval—T is restricted to the positive integers. The justification is 
identical to the one we gave in proving Theorem 14.5.2; see (14.52). 


We shall find it convenient to introduce an additional subscript “w” to indicate 
“windowing.” Thus, if we define Xpp(-) as 


Xpa(t)= > Crg(t-, teER, 
eeZ, 


then its windowed version Xpp,w(-) is given by 


Xppw(t) = >> Crg(t- Olt] <T}, teR. 
eed, 


Similarly Xpp.w(-) is the windowed version of the SP 
Xpp(t) = 2Re ( S> Cra(t — 2) —) , teR, 
leZ 


and gy y is the windowed version of 


ge:trg(t—’), LEZ. (18.59) 


We can now express the power in baseband as the limit, as T tends to infinity, of 
E [Xp l3| /(2T), and the power in passband as the limit of E [IXPp wll | /(21). 
Note that, since the function I{-} is real-valued, 


Xppyw(t) = 2Re(Xppw(t)e?"), teR. (18.60) 


But (18.60) notwithstanding, the energy in Xpp.w need not be twice the en- 
ergy in Xpp,w because the signal Xppw—unlike its unwindowed version Xpp—is 
not bandlimited. It is time-limited, and as such cannot be bandlimited (Theo- 
rem 6.8.2). 


The difficulty in proving the theorem is in relating the energy in Xpp.w to the 
energy in Xpp.w and, specifically, in showing that the difference between half the 
energy in Xpp,w and the energy in Xpp.w, when normalized by 2T, tends to zero. 
Aiding us in this is the following lemma relating the energy in passband to the 
energy in baseband for signals that are not bandlimited. 
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Lemma 18.5.3. Let z be a complex energy-limited signal that is not necessarily 
bandlimited, and consider the real signal x: t+ 2Re(z(t) e?"/e), where f. > 0 is 
arbitrary. Then, 


(Ill, — V2e) <5 Iixl < (llally + v2) 5 (18.61) 


1 
2 


where 


ee 
oa | a(f)| df. (18.62) 


—oco 


Proof. Expressing the FT of x in terms of the FT of z, we obtain that for every 
f € R outside a set of frequencies of Lebesgue measure zero, 


&(f) Lf = 0} 
=2(f—fe) Uf =0}+ 2(-f - fe) Uf = 0} 
=2(f — fe) + 2 (-f — fe) Uf > 0} — 2(f -— f-) Uf < 0}. (18.63) 


We next consider the integral over f of the squared magnitude of the LHS and of 
the RHS of (18.63). Since x is real, its FT is conjugate-symmetric so, by Parseval’s 
Theorem, the integral of the squared magnitude of the LHS of (18.63) is 4 IIx|I3. 
The integral of the squared magnitude of the first term on the RHS of (18.63) is 
given by \|z\|3. Finally, the integral of the squared magnitude of each of the last 
two terms on the RHS of (18.63) is €? and, since they are orthogonal, the integral 
of the squared magnitude of their sum is 2e?. The result now follows from the 
Triangle Inequality (4.14). 


Applying Lemma 18.5.3 with the substitution of xpp,w for z and of xpp.w for x 
we obtain upon noting that f. > W/2 that, in order to establish the theorem, it 
suffices to show that the “out-of-band energy” term 


e =) |@npw(f)|" af (18.64) 
|f|2W/2 
satisfies 1 
. BD co, 
jim ze 0, (18.65) 


with the convergence being uniform. That is, we need to show that e?/T is upper- 
bounded by some function of a, @, y, and T that converges to zero as T tends to 
infinity with a, 3, y held fixed. Aiding us in the calculation of the out-of-band 
energy is the following lemma. 


Lemma 18.5.4. Let x be an energy-limited signal and let W > 0. 


(i) Ifu is any energy-limited signal that is bandlimited to W/2 Hz, then 


1 ssi (AP af < |Ix— ull. (18.66) 
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(ti) In particular, 


iso (AP af < [lxl2. (18.67) 


Proof. Part (ii) follows from Parseval’s Theorem. Part (i) follows by noting that 
if u is an energy-limited signal that is bandlimited to W/2 Hz, then the Fourier 
Transforms of x and x — u are indistinguishable for frequencies f that satisfy 
|f| => W/2. Consequently, 


ih e(NPaf = | e(f) — a(R af 
|f|2W/2 | f|>W/2 


2 
< ||k— ull, 


where the inequality follows by applying Part (ii) to the signal x — u. 


To prove (18.65) fix some integer vy > 2 and express xpp,w as 


XBB,w = S0,w + Siw + S2.w; (18.68) 
where 

Sow = S- Ce Sew; (18.69) 
0<|é|\<T-v 

Siw = y Ce Sew; (18.70) 
T-v<|e|<T+v 

S2w = S- Ce Se.ws (18.71) 
T+v<|l|<oo 


are of corresponding out-of-band energies 


ere / 
|f|=>W/2 


Note that by (18.64), (18.68), and the Triangle Inequality 


Sew(f)| af, «=0,1,2. (18.72) 


e? < (eo +e1 +e2)”. (18.73) 


Since the integer v > 2 is arbitrary, it follows from (18.73) that, to establish (18.65) 
and to thus complete the proof of the theorem, it suffices to show that for every 
fixed integer v > 2, 


1 
jim +60 =0, (18.74) 

. la, 
jm 7a 0, (18.75) 

and that l 
lim (fim +63) =i (18.76) 


We thus conclude the theorem’s proof by establishing (18.74), (18.75), and (18.76). 
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We begin with the easiest, namely (18.75). To establish (18.75) we recall the 
definition of e; (18.72) & (18.70) and use the Triangle Inequality to obtain 


? ; 1/2 
es bs CF vol Gew(f)| a) 


TH-v<|e|<T+v 


<y Y Ieewlle 


T-v<|e|<T+v 
<4 llgllo, (18.7) 


where the second inequality follows from (18.23) and from Lemma 18.5.4 (ii), and 
where the final inequality follows because windowing can only reduce energy so 
Igewll> < llgello = IIgllp- Inequality (18.77) establishes (18.75). 


Having established (18.75), we next turn to proving (18.74). The proof is quite 
similar except that, instead of using Part (ii) of Lemma 18.5.4, we use Part (i) with 
the substitutions of ge, for x and of gy for u to obtain 


F 2 
|. Mew? af < Iigew 82, €€2. (18.78) 
| f|2W/2 
We further upper-bound the RHS of (18.78) using the decay condition (18.58) as 


gees? = / loo f)|? Ile] > That 


—-T lee) 
/ l(t - 9) at+ [ lg(t — [2 at 


—co 


= g(a) at + [ ltnrar 


—co —£ 


—T-é 2 oo 2 
Bp p 
sf paper fare 


[oe B? 
cof) nar 
alae re 


a 26? 1 
~ 14+2a (T— |é))+2@’ 
to obtain 


i 2 tye ie 1 
‘ae dew D)| ay) s 1+2a (T— |@)i/2te’ ele: (18.79) 


Using (18.72), (18.69), (18.79), (18.23), and the Triangle Inequality we thus obtain 


‘ 2 ve 
as jeal( J, ldem( ay) 


I 


[Plieel 


O<|é|<T-» 
Bg OE 1 
= V1+2a (T— [é))2+e 


O<|é|<T-v 
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2 2722 Tv 
sf e aD 
ee err een eat 


WB 1 
=2 = 

1+ 2a d. f1/2+a 

24202 fl 1 
< 
<2 1+ 2a i £1/2+a dg 

2 82 1/2-a —~a : 
oe ae tata (1 *—(—1?-*) fa A 1/2 : (18.80) 
273(InT— In(v — 1)) ifa=1/2 


where the inequality in the first line follows from (18.72) and from the Triangle 
Inequality; the inequality in the second line from (18.79); the inequality in the 
third line by counting the term @ = 2 twice; the equality in the fourth line by 
changing the summation variable to £4T-£; the inequality in the fifth line from 
the monotonicity of the function € +H €~!/2-@, which implies that 


(-1/2-0 < : 1 dé; 
= 74 €1/2+0 : 


and where the final equality on the sixth line follows by direct calculation. Inequal- 
ity (18.80) combines with our assumption that a is positive to prove (18.74). 


We now conclude the proof of the theorem by establishing (18.76). To that end, we 
begin by using Lemma 18.5.4 (ii) and the fact that so. is zero outside the interval 
[—T, T] to obtain 
. 
ea< i) |s2,w(t)|* de. (18.81) 
-T 


We next upper-bound the RHS of (18.81) using the boundedness of the symbols 
(18.23) and the decay condition (18.58): 


JawiQl=|  carm(o| 


T+v<|e|<oo 
<y Do lot-O/Hei < B 
T+v<|l|<oo 
B 
ay y I{|t| < T} 
T+v<|l|<oo 


Tre Kt] <7} 


T+v<|l|<oo ||2| 


B 
<7 We 


1 
=26 DO @— ite 
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ae 
= 278 SS fita 


l=v4+1 
< 248 i ela ge 


_ FAB. 6 
= —-p : 
a 


(18.82) 


where the equality in the first line follows from the definition of sg, (18.71); the 
inequality in the second line from the Triangle Inequality for Complex Numbers 
(2.12), the boundedness of (Cr) (18.23), and from the definition of gy (18.59); the 
inequality in the third line from (18.58); the inequality in the fourth line because 
|€ — ¢| = ||| — |¢|| whenever €,¢ € R; the inequality in the fifth line because for 
|t| > T the LHS is zero and the RHS is positive, and because for |t| < T we have 
that |é| — |t| > |¢| — T throughout the range of summation; the equality in the 
sixth line from the symmetry of the summand and from the assumption that T is 
an integer; the equality in the seventh line by changing the summation variable to 
= 0—T; the inequality in the eighth line from the monotonicity of the function 
€+> €-!-%, which implies that 


al fof 
— x een - 
(ita ~ a oe dg; 


and the final equality in the ninth line by evaluating the integral. 
It follows from (18.82) and (18.81) that 


3 << 2T—-v (18.83) 
and hence that ot 
a On be, 
a oer 


which proves (18.76). 
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In this section we justify the derivations of Section 18.4. 


18.6.1 On Limits of Convolutions 


We begin with a lemma that justifies the swapping of infinite summation and 
convolution. As a corollary we establish conditions under which feeding a (real or 
complex) PAM signal of pulse shape g to a stable filter of impulse response h is 
tantamount to replacing its pulse shape g with the new pulse shape g xh. 


Lemma 18.6.1. Let s1,s2,... be a sequence of measurable functions from R to C 
satisfying the following two conditions: 


328 Energy, Power, and PSD in QAM 


1) The sequence is uniformly bounded in the sense that there exists some positive 
number Oo, such that 


Jse(t)| < O00; (eR, C21 Dies) (18.84) 


2) The sequence converges to some function s uniformly over compact sets in 
the sense that for every fixed € > 0 


lim sup |s(t) — s¢(t)| = 0. (18.85) 
Loo |t]<€ 
Then for every he Ly, 
jim (scx h)(t) = (sxh)(t), teER. (18.86) 


Proof. Fix some epoch to € R and some h € £,. We will show that for every 
€ > 0 there exists some Lp € N (depending on ¢) such that 


|(se *h) (to) = (s xh) (to)| <€, L> Lo. (18.87) 


To that end note that our assumption that h is integrable implies that there exists 


some € > 0 such that 
[LInoler < 5 (18.88) 


And when we apply our assumption that the sequence s1,S2,... converges to s 
uniformly over compact sets to the compact interval [to — €,t9 + €], we obtain that 
there exists some Lo (depending on e, to, and €) such that 


€ 


\|hl| , sup | s(r) - se(r)| <=, &2>Lo. (18.89) 
to—ESTStoté 3 
We can now derive (18.87) as follows: 
(se * h) (to) — (s+) (to)| 
= 8e(to — T)h(r) dr — ‘| 8(to — T)h(r) dr 


a se(ty —T)h iryar— fo s(ty — T)h(r) dr 


+|f at feoeGvael 


aF |se(to — 7) — s(to —7)| |A(r)| ar 
—€ 


+ [bo henart f  |selto—2 h(r)| dr 


< |{h]l, ( sup | s(7) — s(r)]) +2000 f |h(r)| dr 


to—€STStot+€ IT|>§ 


Wee 8e(to — T)h(r) dr 


<6, 


where the last equality follows from (18.88) and (18.89). 
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Corollary 18.6.2. If the sequence (C,) is bounded in the sense of (18.23) and if the 
measurable function g satisfies the decay condition (18.22), then for every h € Ly 
and every epoch tg € R 


(( HS Oat - fT.) + h) (to) = S> Ce(g + h)(to — £1). (18.90) 


LEZ LeZ 


Proof. Follows by applying Lemma 18.6.1 to the functions 


L 
B.S S- C,9(t—£1,), L=1,2,... 
é=—-L 


18.6.2 On the Support of the Operational PSD of Xpp 


We next prove that if the pulse shape g is bandlimited to W/2 Hz, then the 
operational PSD of Xpp is zero at frequencies outside the band [—W/2,W/2]. 
That is, we justify (18.46). 


Proposition 18.6.3. Assume that A, Ts, g, W, and f. are as in Theorem 18.2.1 
and, additionally, that g satisfies the decay condition (18.22) and that the CSP 
(Cz) is bounded in the sense of (18.23). If the CSP (Xpp(t), t € R) of (18.25) is 
of operational PSD Spp(-), then Spp(f) is zero for all |f| > W/2 outside a set of 
Lebesgue measure zero, and consequently 


Ww 
fr Spa(f) If < >} 
is also an operational PSD for (Xpp(t), te R). 


Proof. We shall show that the proposition’s hypotheses imply that if h € Ly is 
such that h(f) = 0 at all frequencies f satisfying |f| < W/2, then the power in 
Xpp xh is zero, irrespective of the values of h( f) at other frequencies. That is, we 
shall show that 


(Cf) =§, 17 w/2) a (Power in xkaehe= 0), heL;. (18.91) 


Since Xpp is, by assumption, of operational PSD Sgp(-), it will then follow from 
(18.91) that 


(if =0, [rls we) > (fo Sel iMPar=0), he Ls (18.92) 


From (18.92) it is just a technicality to show that the nonnegative function Spp(-) 
must be zero at all frequencies |f| > W/2 outside a set of Lebesgue measure 
zero. Indeed, if, in order to reach a contradiction, we assume that Spp(-) is not 
indistinguishable from the all-zero function in some interval [a,b], where a and b 
are such that W/2 <a < b, then picking h as an integrable function such that 
h(f) is zero for |f| < W/2 and such that h(f) = 1 for a < f < b would yield 
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a contradiction to (18.92). (An example of such a function h is the IFT of the 
shifted-trapezoid mapping 


fo io if f <W/2 or f>6+(a—-W/2), FER, 
1 Loe ae otherwise, 


which is a frequency shifted version of the function we encountered in (7.15) and 
(7.17).) The assumption that Sgp(-) is not indistinguishable from the all-zero 
function in some interval [a,b] where a < b < —W/2 can be similarly contradicted. 


To complete the proof we thus need to justify (18.91). This follows from two 
observations. The first is that, by Corollary 18.6.2, for every h € L, 
Power in Xpp * h = Power int AS Cy (g * h)(t — £T,). (18.93) 
eZ, 


The second is that, because g is an integrable function that is bandlimited to W/2 
Hz, it follows from Proposition 6.5.2 that 


w/2 ; 
(exnyn=f ananeas, ter 


—w/2 


and, in particular, 


(iF) =0, If < W/2) = (gxh=0), he £). (18.94) 


Combining (18.93) and (18.94) establishes (18.91). 


18.6.3 On the Definition of the Operational PSD 


In order to demonstrate that (Z (t), t€ R) is of operational PSD Szz, one has 
to show that (18.43) holds for every function h: R — C in £, (Definition 18.4.1). 
It turns out that it suffices to establish (18.43) only for functions that are in a 
subset of £1, provided that the subset is sufficiently rich. This result will allow 
us to consider only functions h of compact support. To make this result precise 
we need the following definition. We say that the set H is a dense subset of L, 
if H is a subset of £; such that for every h € £, there exists a sequence hj, hg,... 
of elements of H such that lim,_... ||h — h,||, = 0. An example of a dense subset 
of £, is the subset of functions of compact support, where a function h: R — C is 
said to be of compact support if there exists some A > 0 such that 


h(t) =0, |t| >A. (18.95) 
Lemma 18.6.4 (On Functions of Compact Support). 


(i) The set of integrable functions of compact support is a dense subset of Ly. 


(it) If h is of compact support and if g satisfies the decay condition (18.22) with 
parameters a, 3,1; > 0, then gxh also satisfies this decay condition with the 
same parameters a and T; but with a possibly different parameter 3’. 
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Proof. We begin with Part (i). Given any integrable function h (not necessarily 
of compact support) we define the sequence of integrable functions of compact 
support hj,ho,... by hy: t A(t) T{|t| < v} for every v € N. It is then just a 
technicality to show that ||h — h,||, converges to zero. (This can be shown using 
the Dominated Convergence Theorem because |h,(t)| < |h(t)| for all t € R and 
because h is integrable.) 


We next prove Part (ii). Let g satisfy the decay condition (18.22) with the positive 
parameters a, 3, Ts, and let A > 0 be such that (18.95) is satisfied. We shall prove 
the lemma by showing that 


pr R 


h)(t)| < t 18. 
l(g * OS TER ’ ( 8.96) 
where 
6 = B(Mhl|, 2°+¢(1 + 2A/T.)"**). (18.97) 
To that end we shall first show that 
\(g«h)(t)|<S||hl],, teR (18.98) 
and 1 
I(g«h)()| < 8 |[hI|, 2*° It] >24. (18.99) 


1+ (|¢|/Ts)'*°" 


We shall then proceed to show that the RHS of (18.96) is larger than the RHS of 
(18.98) for |t| < 2A and that it is larger than the RHS of (18.99) for |t] > 2A. 


Both (18.98) and (18.99) follow from the bound 


tta 
\(gxh)(t)| = i a(r)h(t — 7) dr 


—A 


t+rA 
< f latriliae—alar 


<f (sep lolol) Intnl 


-A \t-A<o<t+A 


=|[hl|, sup — |g(o)| 
t—A<o<t+A 


as follows. Bound (18.98) simply follows by using (18.22) to upper-bound |g(t)| 
by 3. And Bound (18.99) follows by using (18.22) to upper-bound |g(¢)| for |t| > A 
by 8/(1 + ((|t] — A)/Ts)1*%), and by then upper-bounding this latter expression 
in the range |t| > 2A by 62'**/(1 + (|t|/Ts)'*%) because in this range 


1+ ((\]—A)/T) 7 =1 Ge 7 =)" 


sy” 


1l+a 
SOT ee g-c+o) ( HT) & lil oee 
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Having established (18.98) and (18.99) we now complete the proof by showing that 
the RHS of (18.96) upper-bounds the RHS of (18.98) whenever |t| < 2A, and 
that it upper-bounds the RHS of (18.99) for |¢| > 2A. That the RHS of (18.96) 
upper-bounds the RHS of (18.98) whenever |t| < 2A follows because 


Bibl, 2°+*(1 + (2A/T.) +9) 


1+ (14 > Albi, 2'*¢ > Bllbll,, |t] < 2d. 


And that the RHS of (18.96) upper-bounds the RHS of (18.99) whenever |t| > 2A 
follows because the term 1 + (2A/T,)'*° is larger than one. 


Proposition 18.6.5. Assume that H is a dense subset of £1 and that the (real or 
complex) measurable stochastic process (Z(t), t € R) is bounded in the sense that 
for some Ooo 


|Z(t)| <on, teER. (18.100) 
If S(-) ts @ nonnegative integrable function such that the relation 


Co 


Power inZeh= [ S(f) |ACA)|? df (18.101) 


—oo 


holds for every h € H, then it holds for allh € Ly. 
Proof. Let h be an element of £, (but not necessarily of H) for which we would 


like to prove (18.101). Since 1 is a dense subset of £,, there exists a sequence 
hj, hg,... of elements of H 


h,€H, v=1,2,... (18.102) 
such that 
lim ||h—h,||, =0. (18.103) 


We shall prove that (18.101) holds for h by justifying the calculation 


Power in Z*h = Jim, Power in Z xh, (18.104) 
= lim, 7 S(f) |hu(f) | df (18.105) 
= he S(f) |ACf)|" df. (18.106) 


The justification of (18.105) is that, by (18.102), each of the functions h, is in H, 
and the proposition’s hypothesis guarantees that (18.101) holds for such functions. 


The justification of (18.106) is a bit technical. It is based on noting that (18.103) 
implies (by Theorem 6.2.11 (i) with the substitution of h — h, for x) that 


lim hi(f)=h(f), f ER (18.107) 


v—-oo 


and by then using the Dominated Convergence Theorem to justify the swapping of 
the limit and integral. Indeed, (by Theorem 6.2.11 (i)) for every v € N, the function 
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f > S(f) hv(f) is bounded by the function f (sup, ||hy|| ,) S(f), which is 
integrable because S(-) is integrable (by the proposition’s hypothesis) and because 
the integrability of h and (18.103) imply that the supremum is finite as can be 
verified using the Triangle Inequality by writing h, as h — (h —h,). 


We now complete the proof by justifying (18.104). Since Zxh, = Zxh—Zx(h—h,), 
it follows from the Triangle Inequality for Stochastic Processes (Proposition 18.5.1) 
that for every T > 0 


Afonso - Ve [iz sntor?ar 
< Ve [ie =e —nancoar 


< ViTo. |b — bull, (18.108) 


where the second inequality follows from (18.100) using (5.8c). Upon dividing by 
V2T and taking the limit of T— oo , it now follows from (18.108) that 


| Power in Z xh, — VPower in Z*h| < ox ||h — hy, , 


from which (18.104) follows by (18.103). 


18.6.4 Relating the Operational PSD in Passband and Baseband 


We next make the relationship (18.44) between the operational PSD of X and the 
operational PSD of Xpp formal. 


Theorem 18.6.6. Under the assumptions of Proposition 18.6.8, if the complex 
stochastic process (Xpp(t), t € R) of (18.25) is of operational PSD Spp(-) in 
the sense that Spp(-) is an integrable function satisfying that for every complex 
h, € Li , 


1 


T 2 foe) s 9 
jim aE | [-|oeo0n9.0 af =f Spa(f) |he(f)| df, (18.109) 


then the QAM real SP (X(t), t € R) of (18.24) is of operational PSD 


Spp(f) = Spa(f— fe) +Sep(-f-fe), feR (18.110) 


in the sense that Spp(-) is an integrable symmetric function such that for every 
realh, € Ly 


sim 5 e| f “[exengeof ae =f seafi(pPaf (ast) 


Proof. The hypothesis that Spp(-) is integrable clearly implies that Spp(-), as 
defined in (18.110), is integrable and symmetric. It remains to show that if (18.109) 
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holds for every complex h, € £1, then (18.111) must hold for every real h, € Ly. 
Since the set of integrable functions of compact support is a dense subset of Ly 
(Lemma 18.6.4 (i)), it follows from Proposition 18.6.5 that it suffices to establish 
(18.111) for real functions h, that are of compact support. Let h, be such a 
function. The following calculation demonstrates that passing the QAM signal X 
through a filter of impulse response h, is tantamount to replacing its pulse shape g 
with the pulse shape consisting of the convolution of g with the complex signal 
THe rher hi (7): 


(X «h,)(t) = ((r t+ 2Re(Xpp(r) gute) 5 hs) (t) 
= 2Re( ((r I+ Xpp(r) 2747) « h,) () 
= 2Re (c2""(Xon (TR ender he(r))) ) 


Lb ite Ga A 3 Cy (ex (7H er fer he(r))) (t- at.) 


L=—0o 


= 2ne(A S- Cr (g xh.) (t — €Ts) gue), (18.112) 


l=—0o 


where the first equality follows from the definition of X in terms of Xpp; the second 
because h, is real (see (7.38) on the convolution between a real and a complex 
signal); the third from Proposition 7.8.1; the fourth from Corollary 18.6.2; and 
where the fifth equality follows by defining the mapping 


hy: tre et het p(t), (18.113) 


Note that by (18.113) 
Ae(f) =hr(f+fe), FER. (18.114) 


It follows from (18.112) that X xh, has the form of a QAM signal with pulse shape 

gxh,. We note that, because g (by hypothesis) satisfies the decay condition (18.22) 

and because the fact that h, is of compact support implies by (18.113) that h, is 

also of compact support, it follows from Lemma 18.6.4 (ii) that the pulse shape 
g xh, satisfies the decay condition 

Br 
+h,)(t)| < , teR 18.115 
(eb) < aR (18.115) 


for some positive 3’. Consequently, we can apply Theorem 18.5.2 to obtain that 
the power of X « h, is given by 


Power in X xh, = 2 Power inttreA +. Cr (g xh.) (t — €Ts) 
l=—0o 


= 2 Power in (¢ mA S- Cr 9(t — fT) xh, 


L=—0o 
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= 2 Power in (Xpp xh.) 


=> Spa(f) [he(f)| af 
= af Spp(f) Jhr(f + fe) | df 
- a San(f— fe) \ie(Al? af 


lo e) 3 2 vi ~ 12 Ly) 
=f (Smolf-f) +Sna(-F— fo) |ie(AlPaF, (08.116) 
where the second equality follows from Corollary 18.6.2; the third by the definition 
of Xgp; the fourth because, by hypothesis, Xpp is of operational PSD Spp(-); the 
fifth from (18.114); the sixth by changing the integration variable to f 4 f + fe; 
and the seventh from the conjugate symmetry of h,(-). 


Since h, was an arbitrary integrable real function of compact support, (18.116) 
establishes (18.111) for all such functions. 


Corollary 18.6.7. Under the assumptions of Theorem 18.6.6, the QAM signal 
(X(t), t€ R) és of operational PSD 


Sxx(f) = Spa(|fl- fc), feR. (18.117) 


Proof. Follows from the theorem by noting that, by Proposition 18.6.3 and by the 
assumption that f. > W/2, 


Spa(f — fc) + Spp(—f — fe) = Spa(|f| — fe) 


at all frequencies f outside a set of frequencies of Lebesgue measure zero. 


18.6.5 On the Operational PSD in Baseband 


In the calculation of the operational PSD of the QAM signal (X(t)) via (18.44) 
(which is formally stated as Corollary 18.6.7) we needed the operational PSD of 
the CSP (Xgp(t)) of (18.25). In this section we justify the calculations of this 
operational PSD that lead to Theorems 18.4.3 and 18.4.4. Specifically, we show: 


Proposition 18.6.8 (Operational PSD of a Complex PAM Signal). Let the CSP 
(Xpp(t), te R) be given by (18.25), where A > 0, T; > 0, and where g is a 
complex Borel measurable function satisfying the decay condition (18.22) for some 
constants a, 3 > 0. 


(i) If (Cz) is a bounded, zero-mean, WSS CSP of autocovariance function Kec, 
i.e., if it satisfies (18.23) and (18.28), then the CSP (Xpp(t), t € R) is of 
operational PSD Spp(-) as given in (18.49). 


(wi) If (C2) is produced in bi-infinite block-mode from IID random bits using an 
encoder enc: {0,1} — CN that produces zero-mean symbols from IID ran- 
dom bits, then (Xpp(t), t € R) is of operational PSD Spp(-) as given in 
(18.52). 
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Proof. We have all the ingredients that are needed to justify our derivations of 
(18.49) and (18.52). All that remains is to piece them together. Let h be any 
complex integrable function of compact support. Then 


Power in Xpp x h = Power in (( Ke AS Cr g(t — 1.) * h) 
LEZ 
= Power in tr AS” Cy(g *h)(t — fT), (18.118) 
LEZ 


where the first equality follows from the definition of Xpp (18.25), and where the 
second equality follows from Corollary 18.6.2. Note that by Lemma 18.6.4 (ii) the 
function g x h satisfies the decay condition (18.96) for some 3’ > 0. 


To prove Part (i) we now employ Theorem 14.6.4 (which extends to the case where 
the pulse shape and the symbols are complex) with the pulse shape g xh to obtain 
from (18.118) that 


AAP IACAP af, (18.119) 


i Ae a —i2nfm 
Power in Xppxh—= Tf S- Koo(m) e~2rimts 


m>=—Cco 


for every integrable complex h of compact support. It follows from the fact 
that the set of integrable functions of compact support is a dense subset of Ly 
(Lemma 18.6.4 (i)) and from Proposition 18.6.5 that (18.119) must hold for all 
integrable functions. Recalling the definition of the operational PSD (Defini- 
tion 18.4.1), it follows that (Xpp(t), t € R) is of operational PSD Spp(-) as 
given in (18.49). 


The proof of Part (ii) is very similar except that we compute the RHS of (18.118) 
using (18.36) with the substitution of g xh for the pulse shape. 


18.7 Exercises 


Exercise 18.1 (The Second Moment of the Square QAM Constellation). 
(i) Show that picking X and Y IID uniformly over the set in (10.19) results in X +iY 
being uniformly distributed over the set in (16.19). 
(ii) Compute the second moment of the square 2v x 21 QAM constellation (16.19). 


Exercise 18.2 (Optimal Constellations). Let C denote a QAM constellation, and define 
for every z € C the constellation C’ = {e—z:c€ C}. 

(i) Relate the minimum distance of C’ to that of C. 

(ii) Relate the second moment of C’ to that of C. 


(iii) How would you choose z to minimize the second moment of C’? 


Exercise 18.3 (The Power in Baseband Is Real). Show that the RHS of (18.29) is real. 
Which properties of the autocovariance function Koc and of the self-similarity func- 
tion Rgg are you exploiting? 
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Exercise 18.4 (7/4-QPSK). In QPSK or 4-QAM the data bits are mapped to complex 
symbols (Ce) which take value in set {+1 +i} and which are then transmitted using the 
signal (X(t)) defined in (18.24). Consider now 7/4-QPSK where, prior to transmission, 
the complex symbols (Ce) are rotated to form the complex symbols 


where ~a =e 


Co=a'Cr, £2, 
it/4” The transmitted signal is then 


2A Re( 3 Ce g(t — £1.) 7%"), teR. 


L=—0o 


Compute the power and the operational PSD of the 7/4-QPSK signal when (Ce) is a zero- 
mean WSS CSP of autocovariance function Kcc. Compare the power and operational 
PSD of 7/4-QPSK with those of QPSK. How do they compare when the symbols (Cc) 
are IID? 


Hint: 


See Exercise 17.12. 


Exercise 18.5 (The Bandwidth of the QAM Signal). Formulate and prove a result anal- 
ogous to Theorem 15.4.1 for QAM. 


Exercise 18.6 (Bandwidth and Power in PAM and QAM). Data bits (Dj) are generated 
at rate Ry bits per second. 


(i) 


(iii) 


(iv) 


Hint: 


The bits are mapped to real symbols using a (K, N) binary-to-reals block-encoder 
of rate K/N bits per real symbol. The symbols are mapped to a PAM signal 
of pulse shape @ whose time shifts by integer multiples of T; are orthonormal 
and whose excess bandwidth is 7. Find the bandwidth of the transmitted signal 
(Definition 15.3.4). 

Repeat for the bandwidth around the carrier frequency f. in QAM when the bits 
are mapped to complex symbols using a (K, N) binary-to-complex block-encoder 
of rate K/N bits per complex symbol. (As in Part (i), the pulse shape is of excess 
bandwidth 1.) 

Show that if we express the rate p of the block-encoder in both cases in bits per 
complex symbol, then in the former case p = 2K/N; in the latter case p = K/N; 
and in both cases the bandwidth can be expressed as the same function of Rp, p, 
and 7. 

Show that for both PAM and QAM the transmitted power is given by 


_ EsRp 
p 


provided that the energy per symbol Es and the rate p are computed in both cases 
per complex symbol. 


P 


Exercise 18.5 is useful for Part (ii). 


Exercise 18.7 (Operational PSD of Differential PSK). Let the bi-infinite sequence of IID 
random bits gers JE Z) be mapped to the complex symbols (Ce, LE Z) as follows: 


2 
Cex = Ce exp(i (4Dae + 2D3e41 + Dscs2))) £0,512 oc 


27 
pl 


Ce = Crest exp( 3 (4D3e + 2D3e41 + Dac+2)); £=...,—2,-1, 
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where C is independent of (D;) and uniformly distributed over the set 


(27 27 12Q2r 720 
2 3 7 
Cae Reese! Beacigel eh. 


Find the operational PSD of the QAM signal under the assumptions of Section 18.3 on 
the pulse shape. 


Exercise 18.8 (PAM/QAM). Let D,,...,D x be IID random bits. These bits are mapped 
by a mapping gaa: {0,1}* > C” to the complex symbols Ci,...,Cn, which are then 
mapped to the QAM signal 


Xqam(t; Di,..., De) = 2A Re( Ce daam(t — £Ts,Qam) oo), teR, 


g=1 
where the time shifts of dqam by integer multiples of Ts gam are orthonormal. 


Define the real symbols X1,..., Xan by 
Xoge-1 =Re(Cr), Xoae=Im(Cr), ¢€€ {1,...,n} 


and the corresponding PAM signal 


2n 


Xpam(t; Di,..., Dk) = AS” Xe dpam(t—£Ts,pam), teER, 
t=1 


where @pam is real and its time shifts by integer multiples of Ts. pam are orthonormal. 


(i) Relate the expected energy in Xqam to that in Xpam. 


(ii) Relate the minimum squared distance 


oo 2 
min Rei ods dacydel = XGaml edad, ) dt, 
(d1,...,d4 )A(d4,---, ee aam( , ‘) aam( " ‘) 
to 
oo 2 
min if (Xpam(t; di, ..- de) — Xpam(tidh,-.-,d)) dt. 
(dy, dg Ad) 5d) F006 


Exercise 18.9 (The Operational PSD is Nonnegative). Show that if the CSP (Z(t), t € R) 
is of operational PSD Szz, then Szz(f) must be nonnegative outside a set of frequencies 
of Lebesgue measure zero. 


Hint: See Exercise 15.5. 


Chapter 19 


The Univariate Gaussian Distribution 


19.1 Introduction 


In many communication scenarios the noise is modeled as a Gaussian stochastic 
process. This is sometimes justified by invoking a Central Limit Theorem, which 
demonstrates that many small independent disturbances add up to a stochastic 
process that is approximately Gaussian. Another justification is mathematical 
convenience: while Gaussian processes may seem daunting at first, they are actually 
well understood and often amenable to analysis. Finally, particularly in wireline 
communications, the Gaussian model is justified because it leads to robust results 
and to good engineering design. For other scenarios, e.g., fast-moving wireless 
mobile communications, more intricate models are needed. 

Rather than starting immediately with the definition and analysis of Gaussian 
stochastic processes, we shall take the more moderate approach and start by first 
discussing Gaussian random variables. Building on that, we shall later discuss 
Gaussian random vectors in Chapter 23, and only then introduce continuous-time 
Gaussian stochastic processes in Chapter 25. 


19.2 Standard Gaussian Random Variables 


We begin with a special kind of Gaussian: the standard Gaussian. 


Definition 19.2.1 (Standard Gaussian). We say that the random variable W is a 
standard Gaussian or that it has a standard Gaussian distribution, if its 
density function fw(-) is given by 


fw(w) = e =, weR. (19.1) 


This density is depicted in Figure 19.1. For this definition to be meaningful, the 
RHS of (19.1) had better be a valid density function, i.e., be nonnegative and 
integrate to one. This is indeed the case. In fact, the RHS of (19.1) is positive, 
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fw(w) 


> WwW 


Figure 19.1: The standard Gaussian density function 


and it integrates to one because, as we next show, 
ed 2 
/ ew? dw = V2n. (19.2) 
=0o 


This integral can be verified by computing its square as follows: 
oe) we 2 ee) “Ag oe) ee 
(/ e 2 aw) =| e 2 aw | e 2 du 
=O —60O —Cco 
lo) CO eye 
= / / e 2 dwdv 
—Cco =o 
co “us Pe 
Sy / re 2 dpydr 
0 —T 
lo) 22 
= 2n | re 2 dr 
0 


=27 (- gd) i 
0 


= 2, 


where the first equality follows by writing a? as a times a; the second by writing 
the product of the integrals as a double integral over R?; the third by changing 
from Cartesian to polar coordinates: 


w=rcosy, v=rsiny, r>0, -tT<yp<T, 
dw dv = rdr dy; 
the fourth because, the integrand does not depend on y; the fifth because the 
derivative of —e~" /? is re~’ /?; and where the final equality follows by direct 


evaluation. 


Note that the density of a standard Gaussian random variable is symmetric (19.1). 
Consequently, if W is a standard Gaussian, then so is -W. This symmetry also 
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establishes that the expectation of a standard Gaussian is zero. The variance of a 
standard Gaussian can be computed using integration by parts: 


ie wte* dw = =| w(-se* | dw 
-c V2T V2T J —co dw 
1 w2 foo Oe “eo q 
=— =| -we 2 + e 2 dw 
=z co = ) 


where the last equality follows from (19.2). 


19.3 Gaussian Random Variables 


We next define a Gaussian (not necessarily standard) random variable as the result 
of applying an affine transformation to a standard Gaussian. 


Definition 19.3.1 (Centered Gaussians and Gaussians). We say that a random 
variable X is a centered Gaussian or that it has a centered Gaussian distri- 
bution if it can be written in the form 


X=aW (19.3) 


for some deterministic a € R and for some standard Gaussian W. We say that 
the random variable X is Gaussian or that it has a Gaussian distribution if 


X=aW +b (19.4) 
for some deterministic a,b € R and for some standard Gaussian W. 


Note 19.3.2. We do not preclude a from being zero. The case a = 0 leads to X 
being deterministically equal to b. We thus include the deterministic random vari- 
ables in the family of Gaussian random variables. 

Note 19.3.3. The family of Gaussian random variables is closed with respect to 
affine transformations: if X is Gaussian and a,@ € R are deterministic, then 
aX + @ is also Gaussian. 


Proof. Since X is Gaussian, it can be written as X = aW +b, where W isa 
standard Gaussian. Consequently 
aX +8=a(aW + b)+ 6 
= (aa)W + (ab + ), 


which has the form a’W + 0b’ for some deterministic a’, b’ € R. 


If (19.4) holds, then the random variables on its RHS and LHS must have the same 
mean. The mean of a standard Gaussian is zero, so the mean of the RHS of (19.4) 


342 The Univariate Gaussian Distribution 


is b. The LHS is of mean E[X], and we thus conclude that in the representation 
(19.4) the deterministic constant b is uniquely determined by the mean of X, and 
in fact, 

b= E[X]. 


Similarly, since the variance of a standard Gaussian is one, the variance of the RHS 
of (19.4) is a?. And since the variance of the LHS is Var[X], we conclude that 

a® = Var[X]. 
Up to its sign, the deterministic constant a in the representation (19.4) is thus also 
unique. 
Based on the above, one might mistakenly think that for any given mean fz and 
variance o” there are two different Gaussian distributions corresponding to 


oW+p, and —oW+ny, (19.5) 


where W is a standard Gaussian. This, however, is not the case: 


Note 19.3.4. There is only one Gaussian distribution of a given mean and variance. 


Proof. This can be seen in two different ways. The first is to note that the two 
representations in (19.5) lead to the same distribution, because the standard Gaus- 
sian W has a symmetric distribution, so 7W and —oW have the same distribution. 
The second is based on computing the density of coW + yz and showing that it is a 
symmetric function of a; see (19.6) ahead. 


Having established that there is only one Gaussian distribution of a given mean fu 
and variance a”, we denote it by 


N(u,0") 


and set out to study its density. Since the distribution does not depend on the 
sign of a, it is customary to require that o be nonnegative and to refer to it as the 
standard deviation. Thus, o? is the variance and o is the standard deviation. 
If o? = 0, then the Gaussian distribution is deterministic with mean jp and has 
no density. If o? > 0, then the density can be computed from the density of 
the standard Gaussian distribution as follows. If X ~ N (n, oy; then X has the 
same distribution as 1 +aW, where W is a standard Gaussian, because both X 
and +oW are of mean p and variance o? (W is zero-mean and unit-variance); 
both are Gaussian (Note 19.3.3); and Gaussians of identical means and variances 
have identical distributions (Note 19.3.4). The density of X is thus identical to the 
density of 4+oaW. The density of the latter can be computed from the density 
of W (19.1) to obtain that the density of a N(j,07) Gaussian random variable of 
positive variance is 


1 we 
Si “SE, 2 eR. (19.6) 
TO 


This density is depicted in Figure 19.2. To derive the density of u+oaW from 


1Some would say that the density of a deterministic random variable is given by Dirac’s Delta, 
but we prefer not to use generalized functions in this book. 
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Figure 19.2: The Gaussian density function with mean jz and variance o?. 


that of W, we have used the fact that if X = g(W), where g(-) is a deterministic 
continuously differentiable function whose derivative never vanishes (in our case 
g(w) = 4+ ow) and where W is of density fy(-) (in our case (19.1)), then the 
density fx(-) of X is given by: 


ee 0 if for no € is a = g(&), 
ww "| ota fw (6) if € satisfies « = 9(€), 


where g’(€) denotes the derivative of g(-) at €. (For a more formal multivariate 
version of this fact see Theorem 17.3.4.) 


(19.7) 


Since the family of Gaussian random variables is closed under deterministic affine 
transformations (Note 19.3.3), it follows that if X ~ N (pu, a”) with o? > 0, then 
(X — p)/o is also a Gaussian random variable. Since it is of zero mean and of 
unit variance, it follows that it must be a standard Gaussian, because there is only 
one Gaussian distribution of zero mean and unit variance (Note 19.3.4). We thus 
conclude that for 0? > 0 and arbitrary pw € R, 


(x ~N (0%) ) => (“+ ~N(0,1)). (19.8) 


Recall that the Cumulative Distribution Function F'x(-) of a RV X is defined 
for x € Ras 


Fx (x) = Pr[X < a], 
=f fxleae, 


where the second equality holds if X has a density function fx(-). If W isa 
standard Gaussian, then its CDF is thus given by 


Ww 1 2 
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} > 
a 


Figure 19.3: Q(q) is the area to the right of a@ under the standard Gaussian density 
plot. Here it is represented by the shaded area. 


There is, alas, no closed-form expression for this integral. To handle such expres- 
sions we next introduce the Q-function. 


19.4 The Q-Function 


The Q-function maps every a € R to the probability that a standard Gaussian 
exceeds it: 


Definition 19.4.1 (The Q-Function). The Q-function is defined by 


Q(a) 4 ie / “e®P ae, a€R. (19.9) 


For a graphical interpretation of this integral see Figure 19.3. 


Since the Q-function is a well-tabulated function, we are usually happy when we can 
express answers to various questions using this function. The CDF of a standard 
Gaussian W can be expressed using the Q-function as follows: 


Fy (w) = Pr[W < w} 

=1-—Pr[W > uw] 

=1-Q(w), weR, (19.10) 
where the second equality follows because the standard Gaussian has a density, 
so Pr[W = w] = 0. Similarly, with the aid of the Q-function we can express the 
probability that a standard Gaussian W lies in some given interval [a, }]: 

Prla < W < b] = Pr[W > a] — Pr[W > 3] 
= O(a) — O(b), a<b. 
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More generally, if X ~ N(y,07) with o > 0, then 
Pria < X < b] = Pr[X >a] —Pr[xX >b], a<b 


xX — _ X— b— 
= P| Seca | Pr a= “), a >0 
oO oO o or 


= 9(“*) oS), (a<b, o>0), (19.11) 


Oo 


where the last equality follows because (X — j1)/o is a standard Gaussian; see 
(19.8). Letting 6 tend to +00 in (19.11), we obtain the probability of a half ray: 


Pr[X >a] = o(“—*), o> 0. (19.12a) 
oO 
And letting a tend to —oo we obtain 
bu 
oO 


Pr[X <b] =1— Q( 1, o>0. (19.12b) 


The Q-function is usually only tabulated for nonnegative arguments, because the 
standard Gaussian density (19.1) is symmetric: if W ~ N(0,1) then, by the sym- 
metry of its density, 


Pr[W > —a] = Pr[W < a] 
=1-PriWeal, aeR. 


Consequently, as illustrated in Figure 19.4, 
O(a)+ O(-a)=1, aeR, (19.13) 


and it suffices to tabulate the Q-function for nonnegative arguments. Note that, 
by (19.13), 
Q(0) = 5. (19.14) 


An alternative expression for the Q-function as an integral with fixed integration 
limits is known as Craig’s formula: 


nm /2 a2 
Q(a) = - | e 2n?o dp, a>0. (19.15) 
0 


This expression can be derived by computing a two-dimensional integral in two 
different ways as follows. Let X ~ N(0,1) and Y ~ N(0,1) be independent. 
Consider the probability of the event “X > 0 and Y > a” where a > 0. Since the 
two random variables are independent, it follows that 


Pr[X > Oand Y > aj = Pr[X > 0) Pr[Y > a] 


i 
= 52(a), (19.16) 
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Qa) 
A 
Q(a) 
A 
Q(—a) 
A 


Figure 19.4: The identity Q(a@) + Q(—a) =1. 
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> C 


area of integration 


a 
Ae 


> tr 


Figure 19.5: Use of polar coordinates to compute $Q(a). 


where the second equality follows from (19.14). We now proceed to compute the 
LHS of the above in polar coordinates centered at the origin (Figure 19.5): 


22 ee 1 w+y? 
Pr[X 20 and ¥ > aj = f i) —e 2 dydz 
0 a 27 


m/2 poo 1 4 
=) / eget ee rdrdy, a>0 
0 o 27 


1 m/2 poo 
= / et dtdy 
2m Jo cae 
1 m/2 he? 
ates! e 23in276 dy, a>Q0O, (19.17) 
QT 0 


where we have performed the change of variable t & r?/2. The integral represen- 
tation (19.15) now follows from (19.16) & (19.17). 


We next describe various approximations for the Q-function. We are particularly 
interested in its value for large arguments.” Since Q(a) is the probability that 
a standard Gaussian exceeds a, it follows that limg..Q(a) = 0. Thus, large 
arguments to the Q-function correspond to small values of the Q-function. The 
following bounds justify the approximation 


al 2 


O(a) & eo? es Tt 19.18 
* Tamat on 
Proposition 19.4.2 (Estimates for the Q-function). The Q-function is bounded 
by 


1 wie 1 1 =a? 
eel (: = =) < O(a) < e-OIas. sarss0 (19.19) 


2In Digital Communications this corresponds to scenarios with low probability of error. 
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and 


1 2 
Qa)<5e%", a2. (19.20) 


Proof. The proof of (19.19) is omitted (but see Exercise 19.3). Inequality (19.20) 
is proved by replacing the integrand in (19.15) with its maximal value, namely, its 
value at y = 7/2. We shall see an alternative proof in Section 20.10. 


19.5 Integrals of Exponentiated Quadratics 


The fact that (19.6) is a density and hence integrates to one, i.e., 


oY = 
/ = ee ae (19.21) 


To 


can be used to compute seemingly complicated integrals. Here we shall show how 
(19.21) can be used to derive the identity 


[o-e} 2 
/ eon +Bax dz = / ela, B E R, a> 0. (19.22) 
—0oo a 


Note that this identity is meaningless when a < 0, because in this case the inte- 
grand is not integrable. For exmples, if a < 0, then the integrand tends to infinity 
as || tends to co. If a= 0 and @ £0, then the integrand tends to infinity either 
as x tends to +00 or as x tends to —oo (depending on the sign of 3). Finally, if 
both a and are zero, then the integrand is 1, which is not integrable. Note also 
that, by considering the change of variable u = —2, one can verify that the sign 
of 8 on the LHS of this identity is immaterial. 


The trick to deriving (19.22) is to complete the exponent to a square and to then 
apply (19.21): 


co co 2 B 
—ax?+Bax =| (- x au ) 
e€ dz = ex ——__— }dax 
[. ag. Sa? 


= ett [* exp Scar 
-c 2(1/V2a)? 
2/5 coer) 
= e4a n(1 ex dx 
ee < 2m (1/ Ee r( 2(1/V/2a)? 
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where the first equality follows by rewriting the integrand so that the term 2? in 


the numerator is of coefficient one and so that the denominator has the form 2c? 
A 


for o which turns out here to be given by o £ 1/\/2a; the second follows by 
completing the square; the third by taking the multiplicative constant out of the 
integral; the fourth by multiplying and dividing the integral by V270? so as to 
bring the integrand to the form of the density of a Gaussian; the fifth by (19.21); 
and the sixth equality by trivial algebra. 


19.6 The Moment Generating Function 


As an application of (19.22) we next derive the Moment Generating Function 
(MGF) of a Gaussian RV. Recall that the MGF of a RV X is denoted by Mx(-) 
and is given by 

Mx (6) = Efe**] (19.23) 


for all 6 € R for which this expectation is finite. If X has density fx(-), then its 
MGEF can be written as 


Mx(6) = fx (x) € da, (19.24) 


thus highlighting the connection between the MGF of X and the double-sided 
Laplace Transform of its density. 


If X ~ N (pu, 07) where o? > 0, then 
Mx(0)= ffl) eP* ae 
= as eo e dx 
= i = e-8 ef (E+H) dé 
co V2an0 


1 / 2 
Ou =p HOE d 
=e € 20 

V2t07 Jo é 


1 7 0? 
= eft e4/(207) 
V2ro02 V 1/(207) 


1922 
= 2ft30" GER, 


where the first equality follows from (19.24); the second from (19.6); the third by 
changing the integration variable to € = x — yu; the fourth by rearranging terms; 
the fifth from (19.22) with the substitution of 1/(207) for a and of 6 for 3; and the 
final by simple algebra. This can be verified to hold also when o? = 0. Thus, 


(x ~N (uy o*)) => (Mx (6) = #4380 ge R). (19.25) 
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19.7 The Characteristic Function of Gaussians 


19.7.1 The Characteristic Function 


Recall that the Characteristic Function ®x(-) of a random variable X is defined 
for every w € R by 
&x(w) = E[e*] 
Co 

= fx(x) el? da, 

—oCo 
where the second equality holds if X has density fx (-). The second equality demon- 
strates that the characteristic function is related to the Fourier Transform of the 
density function but, by convention, there are no 27’s, and the complex exponential 
is not conjugated. If we allow for complex arguments to the MGF (by performing 
an analytic continuation), then the characteristic function can be viewed as the 
MGEF evaluated on the imaginary axis: 


®x(~) = Mx(iv), WER. (19.26) 


Some of the properties of the characteristic function are summarized next. 


Proposition 19.7.1 (On the Characteristic Function). Let X be a random variable 
of characteristic function ® x (-). 


(i) If E[LX"] < co for some n EN, then ®x(-) is differentiable n times and the 
v-th moment of X is related to the v-th derivative of ®x(-) at zero via the 
relation 


1 d’o 
E(x) = + Sex(@) , vel,...,n. (19.27) 
w aw=0 


(tt) Two random variables of identical characteristic functions must have the 
same distribution. 


(iti) If X and Y are independent random variables of characteristic functions 
®x(-) and ®y(-), then the characteristic function ®x+iy(-) of their sum is 
given by the product of their characteristic functions: 


(X & Y independent) > (®xsv(@) = 4 (w)%y(z), we R). (19.28) 
Proof. For a proof of Part (i) see (Shiryaev, 1996, Chapter II, § 12.3, Theorem 1). 


For Part (ii) see (Shiryaev, 1996, Chapter II, § 12.4, Theorem 2). For Part (iii) see 
(Shiryaev, 1996, Chapter II, § 12.5, Theorem 4). 
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For X ~ N(u,07) we obtain from (19.26) and (19.25) that? 


(x s N (1,07) 2s (®x(=) = ei gw oe R). (19.29) 


19.7.2. Moments 


Since the standard Gaussian density decays faster than exponentially, it possesses 
moments of all orders. Those can be computed from the characteristic function 
(19.29) using Proposition 19.7.1 (i) by repeated differentiation. Using this approach 
we obtain that the moments of a standard Gaussian are 


1 vee —1) ifvi 
E(w”) = x 3x x (v ) : Vv sa even, W ~N(0,1). (19.30) 
0 if v is odd, 


We mention here in passing that* 


7 1x3x-+--x(v—1) if v is even, 
E(|WI"] = [22-9 (432) eyiveda VO NOD (19.31) 


(Johnson, Kotz, and Balakrishnan, 1994a, Chapter 18, Section 3, Equation (18.13)). 


19.7.3. Sums of Independent Gaussians 


Using the characteristic function we next show: 

Proposition 19.7.2 (The Sum of Two Independent Gaussians Is Gaussian). The 
sum of two independent Gaussian random variables is a Gaussian RV.° 

Proof. Let X ~N (2,02) and Y ~ N(fy,0Z) be independent. By (19.29), 


iwpin—bw2o2 : 
Ox(w) = eV 27 % OG ER, 


. fet eee) 2 
Oy(w) =e 27%, WER. 


31t does require a (small) leap of faith to accept that (19.25) also holds for complex 0. This can 
be justified using analytic continuation. But there are also direct ways of deriving (19.29); see, for 
example, (Williams, 1991, Chapter E, Exercise E16.4) or (Shiryaev, 1996, Chapter II, Section 12, 
Paragraph 2, Example 2). Another approach is to express d®x(@)/da as E[ix ele X| and to 
use integration by parts to verify that the latter’s expectation is equal to -w®x(m) and to 
then solve the differential equation d®x(w)/dw = —w®x(m) with the condition ®x (0) = 1 to 
obtain that n@x(@) = -40. 

4The distribution of |W| is sometimes called half-normal. It is the positive square root of 
the central chi-squared distribution with one degree of freedom. 

5More generally, as we shall see in Chapter 23, X + Y is Gaussian whenever X and Y are 
jointly Gaussian. And independent Gaussians are jointly Gaussian. 
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Since the characteristic function of the sum of two independent random variables 
is equal to the product of their characteristic functions (19.28), 


®x4y(w) = Ox(w) by(z) 


2 


. 1 2) us Ti 2. 
= lOMa— Woz pl@by— pW oy, 


= doietin)-3o2t0), DER. 
By (19.29), this is also the characteristic function of a N (Ha + py, 02 + a7) RV. 
Since the characteristic function of a random variable fully determines its law 


(Proposition 19.7.1 (ii)), X + Y must be N (tz + fly, 2 + 02). 


Using induction one can generalize this proposition to any finite number of ran- 


dom variables: if X1,...,X, are independent Gaussian random variables, then 
their sum is Gaussian. Applying this to a,Xj,...,@nXn, which are independent 
Gaussians whenever X1,..., Xn are independent Gaussians, we obtain: 


Proposition 19.7.3 (Linear Combinations of Independent Gaussians). Jf the ran- 
dom variables X1,...,Xn are independent Gaussians, and if a1,...,Q, € R are 
deterministic, then the RV Y = ie aeX¢ is Gaussian with mean and variance 


E[Y] = So ae E[Xy], 
é=1 


n 


Var[Y] = S- az Var[X;¢] . 
e=1 


19.8 Central and Noncentral Chi-Square Random Variables 


We summarize here some of the definitions and main properties of the central and 
noncentral y? distributions and of some related distributions. We shall only use 
three results from this section: that the sum of the squares of two independent 
N(0,1) random variables has a mean-2 exponential distribution; that the distri- 
bution of the sum of the squares of n independent Gaussian random variables of 
unit-variance and possibly different means depends only on n and on the sum of 
the squared means; and that the MGF of this latter sum has a simple explicit form. 


These results can be derived quite easily from the MGF of a squared Gaussian RV, 
an MGF which, using (19.22), can be shown to be given by 

1 pe? ee eee 0 1 
oo ge ean oe 5): 19.32 
V1 — 2070 20? ( ) 


With a small leap of faith we can assume that (19.32) also holds for complex 
arguments whose real part is smaller than 1/(207) so that upon substituting iw 
for @ we can obtain the characteristic function 


1 we we 
ee ee ee R), (19.33) 
V1 — inet@ 


(X~N (11,0?) > (Mx2(8) = 


(XN (u,0?)) + (®x2(@) = 
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19.8.1 The Central y? Distribution and Related Distributions 


The central x? distribution with n degrees of freedom is denoted by x2 
and is defined as the distribution of the sum of the squares of n IID zero-mean 
unit-variance Gaussian random variables: 


Cee. ~ TID N(0,1)) > (30%? ~ 2). (19.34) 
j=l 


Using the fact that the MGF of the sum of independent random variables is the 
product of their MGFs and using (19.32) with « = 0 and o? = 1, we obtain that 
the MGF of the central x? distribution with n degrees of freedom is given by 

1 1 

E| xn Saas, 19.35 
: (1 — 20)"72 2 Coe) 

Similarly, by (19.33) and the fact that the characteristic function of the sum of 
independent random variables is the product of their characteristic functions, (or 
by substituting ic for @ in (19.35)), we obtain that the characteristic function of 
the central y? distribution with n degrees of freedom is given by 

1 


. 2 
E[eox| =Gcaepe * R. (19.36) 


Notice that for n = 2 this characteristic function is given by w+ 1/(1 — i2a), 
which is the characteristic function of the mean-2 exponential density 


1 
5 e*/2Ti¢g>0}, «ER. 


Since two random variables of identical characteristic functions must be of equal 
law (Proposition 19.7.1 (ii)), we conclude: 

Note 19.8.1. The central y? distribution with two degrees of freedom x3 is the 
mean-2 exponential distribution. 


From (19.36) and the relationship between the moments of a distribution and the 
derivatives at zero of its characteristic function (19.27), one can verify that the 
v-th moment of a x2 RV is given by 


E[(x2)"] =nx (n+2)x--x (n+ 2v—-D), VEN, (19.37) 


so the mean is n; the second moment is n(n + 2); and the variance is 2n. 
Since the sum of the squares of random variables must be nonnegative, the density 
of the x2 distribution is zero on the negative numbers. It is given by 

1 


= ~2/2 p(n/2)-1] 19. 
TPE (n/2) e€ x {x > 0}, (19.38) 


fy (x) 
where I'(-) is the Gamma function, which is defined by 


re [ ettldt, €>0. (19.39) 


354 The Univariate Gaussian Distribution 


If the number of degrees of freedom is even, then the density has a particularly 
simple form: 


1 


HOD! e~*/2 gk-l fe > 0}, KEN, (19.40) 


fa, (2) = 
thus demonstrating again that when the number of degrees of freedom is two, the 
central y? distribution is the mean-2 exponential distribution (Note 19.8.1). 
A related distribution is the generalized Rayleigh distribution, which is the 
distribution of the square root of a random variable having a x2 distribution. The 
density of the generalized Rayleigh distribution is given by 


2 
Iya) = PEC” 


m1 @-8"/2 1h 9 0}, (19.41) 


and its moments by 


v/2 n+v 
E|( @)] =" ne 2), veN. (19.42) 


The Rayleigh distribution is the distribution of the square root of a x3 random 
variable, i.e., the distribution of the square root of a mean-2 exponential random 
variable. The density of the Rayleigh distribution is obtained by setting n = 2 in 
(19.41): 

(xz) =xe-® /7 {x > O}. (19.43) 


x2 


i 


19.8.2 The Noncentral y? Distribution and Related Distributions 


Using (19.32) and the fact that the MGF of the sum of independent random vari- 
ables is the product of their MGFs, we obtain that if X1,..., X, are independent 
with X; ~ N(y;,07), then the MGF of >), X7 is given by 


V1 — 2076 : 2o2° 


Noting that this MGF depends on the individual means p,..., 4, only via the 
sum of their squares )> 3, we obtain: 


1 . Fer 45 Cpa 4j 1 
(z=s5) ene? eRe, 9 < (19.44) 


Note 19.8.2. The distribution of the sum of the squares of independent equivari- 
ance Gaussians is determined by their number, their common variance, and by the 
sum of the squares of their means. 


The distribution of the sum of n independent unit-variance Gaussians whose squared 
means sum to \ is called the noncentral \? distribution with n degrees of 
freedom and noncentrality parameter . This distribution is denoted by Me ve 
Substituting 0? = 1 in (19.44) we obtain that the MGF of the y? , distribution is 


it “NO 5 25 1 
E [ex = = e-2 eM, <=. 19.45 
V1—20 D ne) 
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A special case of this distribution is the central y? distribution, which corresponds 
to the case where the noncentrality parameter is zero. 


Explicit expressions for the density of the noncentral y? distribution can be found 
in (Johnson, Kotz, and Balakrishnan, 1994b, Chapter 29, Equation (29.4)) and in 
(Simon, 2002, Chapter 2). An interesting representation of this density in terms 
of the density f,2, of the central x? distribution is: 


fa, @=>- (ax -¥) fresy,(@) TER. (19.46) 


jo 


It demonstrates that a x2, random variable X can be generated by picking a 
random integer j according to the Poisson distribution of parameter \/2 and by 
then generating a central y? random variable of n + 2j degrees of freedom. That 
is, to generate a x2, random variable X, generate some random variable J taking 
value in the nonnegative integers according to the law 


2)I 
Pify = jj=en2 OO j=0,1,... (19.47) 
and then generate X according the central y? distribution with n + 27 degrees of 
freedom, where 7 is the outcome of J. 


The density of the x3, distribution is 


fa, («) = Sete) Ty (Vz ) Ix > 0}, (19.48) 


where Io(-) is the modified zeroth-order Bessel function, which is defined in (27.47) 
ahead. 


The generalized Rice distribution corresponds to the distribution of the square 
root of a noncentral y? distribution with n degrees of freedom and noncentrality pa- 
rameter A. The case n = 2 is called the Rice distribution. The Rice distribution 
is thus the distribution of the square root of a random variable having the noncen- 
tral y? distribution with 2 degrees of freedom and noncentrality parameter \. The 
density of the Rice distribution is 


fp (a) = ee OAT, (xv) I(x > 0}. (19.49) 


The following property of the noncentral y? is useful in detection theory. In the 
statistics literature this property is called the Monotone Likelihood Ratio prop- 
erty (Lehmann and Romano, 2005, Section 3.4). Alternatively, it is called the Total 
Positivity of Order 2 of the function (x, A) - f2,, (2). 


Proposition 19.8.3 (The Noncentral 7 Family Has Monotone Likelihood Ratio). 
Let f,2 6) denote the density at € of the noncentral x? distribution with n degrees 
of freedom and noncentrality parameter X > 0; see (19.46). Then for £1,& > 0 
and X1,A2 > 0 we have 


(fo < € and Ay <1) > (hs 9 (60) fea, (G2) S fr, (60) ae a, (4). (19.50) 
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( > ro) => (< a 


é is nondecreasing in € > 0). (19.51) 
Xho 


Proof. See, for example, (Finner and Roters, 1997, Proposition 3.8). 


19.9 The Limit of Gaussians Is Gaussian 


There are a number of useful definitions of convergence for sequences of random 
variables. Here we briefly mention a few and show that, under each of these defi- 
nitions, the convergence of a sequence of Gaussian random variables to a random 
variable X implies that X is Gaussian. 


Let the random variables X, X 1, X2,... be defined over a common probability space 
(Q,F, P). We say that the sequence X1, X2,... converges to X with probability 
one or almost surely if 


Pr({w EQ: lim X,(w) = X(w)}) =f: (19.52) 


Thus, the sequence X1, X2,... converges to X almost surely if there exists an event 
N € F of probability zero such that for every w ¢ NV the sequence of real numbers 
X1(w), X2(w),... converges to the real number X (w). 


The sequence X,, X2,... converges to X in probability if 


lim Pr[|[X,-—X|>e«]=0, €>0. (19.53) 
The sequence X1, X2,... converges to X in mean square if 
lim E[(Xn — X)] =0. (19.54) 


We refer the reader to (Shiryaev, 1996, Ch. II, Section 10, Theorem 2) for a proof 
that convergence in mean-square implies convergence in probability and for a proof 
that almost-sure convergence implies convergence in probability. Also, if a sequence 
converges in probability to X, then it has a subsequence that converges to X with 
probability one (Shiryaev, 1996, Ch. II, Section 10, Theorem 5). 


Theorem 19.9.1. Let the random variables X,X 1, X2,... be defined over a common 
probability space (Q,F,P). Assume that each of the random variables X,,Xo,... 
is Gaussian. If the sequence X1, X2,... converges to X in the sense of (19.52) or 
(19.53) or (19.54), then X must also be Gaussian. 


Proof. Since both mean-square convergence and almost-sure convergence imply 
convergence in probability, it suffices to prove the theorem in the case where the 
sequence X,, X2,... converges to X in probability. And since every sequence con- 
verging to X in probability has a subsequence converging to X almost surely, it 
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suffices to prove the theorem for almost sure convergence. Our proof for this case 
follows (Shiryaev, 1996, Ch. II, Section 13, Paragraph 5). 


Since the random variables X,, X2,... are all Gaussian, it follows from (19.29) that 


E[el7Xn] = cimtn-27 on, ER, (19.55) 
where jz, and o? are the mean and variance of X,. By the Dominated Convergence 
Theorem it follows that the almost sure convergence of X1,X2,... to X implies 
that 

lim E[e'?*"] =E[e'"*], weER. (19.56) 


It follows from (19.55) and (19.56) that 


lim en 22" on 
n—-co 


=Ele™* |. “@wer. (19.57) 
The limit in (19.57) can exist for every w € R only if there exist ,0? such that 
Ln > and 0? — o?. And in this case, by (19.57), 


P ot. ay sat'g 
Ele? | = ciMH-awe : we R, 


so, by Proposition 19.7.1 (ii) and by (19.29), X is V(p, 07). 


Another type of convergence is convergence in distribution or weak conver- 
gence, which is defined as follows. Let F), Fo,... denote the cumulative distri- 
bution functions of the sequence of random variables X1, X2,... We say that the 
sequence F, Fy,... (or sometimes X1, X2,...) converges in distribution to the cu- 
mulative distribution function F'(-) if F,(€) converges to F'(€) at every point € € R 
at which F‘(-) is continuous. That is, 


(ni = F©), (FC is continuous at €). (19.58) 

Theorem 19.9.2. Let the sequence of random variables X1,X2,... be such that 

Xp N (tins Ge) for everyn €N. Then the sequence converges in distribution to 
some limiting distribution if, and only if, there exist some fs and o? such that 

Un > pp and 02 — a”. (19.59) 


And if the sequence does converge in distribution, then it converges to the mean-j 
variance-o? Gaussian distribution. 


Proof. See (Gikhman and Skorokhod, 1996, Chapter I, Section 3, Theorem 4) 
where this statement is proved in the multivariate case. 


For extensions of Theorems 19.9.1 & 19.9.2 to random vectors, see Theorems 23.9.1 
& 23.9.2 in Section 23.9. 
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19.10 Additional Reading 


The Gaussian distribution, its characteristic function, and its moment generating 
function appear in almost every basic book on Probability Theory. For more on 
the Q-function see (Verdi, 1998, Section 3.3) and (Simon, 2002). For more on 
distributions related to the Gaussian distribution see (Simon, 2002), (Johnson, 
Kotz, and Balakrishnan, 1994a), and (Johnson, Kotz, and Balakrishnan, 1994b). 
For more on the central x? distribution see (Johnson, Kotz, and Balakrishnan, 
1994a, Chapter 18) and (Simon, 2002, Chapter 2). For more on the noncentral y? 
distribution see (Johnson, Kotz, and Balakrishnan, 1994b, Chapter 29) and (Simon, 
2002, Chapter 2). Various characterizations of the Gaussian distribution can be 
found in (Bryc, 1995) and (Bogachev, 1998). 


19.11 Exercises 


Exercise 19.1 (Sums of Independent Gaussians). Let X; ~ N’(0,07) and X2 ~ N(0,03) 
be independent. Convolve their densities to show that Xi + X2 is Gaussian. 


Exercise 19.2 (Computing Probabilities). Let X ~ N(1,3) and Y ~ N(—2,4) be inde- 
pendent. Express the probabilities PrLX < 2] and Pr[2X+3Y > —2] using the Q-function 


with nonnegative arguments. 


Exercise 19.3 (Bounds on the Q-function). Prove (19.19). We suggest changing the 
integration variable in (19.9) to ¢ & € — a and then proving and using the inequality 


pak Seep (=) 24, EER. 


Exercise 19.4 (An Application of Craig’s Formula). Let the random variables Z ~ (0, 1) 
and A be independent, where A? is of MGF M,2(-). Show that 


epee 1 
Pr|Z > |A|) = — Maz(- dy. 
r| 2 | J =) al sate) ¥ 
Exercise 19.5 (An Expression for Q?(a)). In analogy to (19.15), derive the identity 
n/4 = a2 
oa) == | e 2imedyp, a>O0. 
0 


Exercise 19.6 (Expectation of O(X)). Show that for any RV X 
1 are _ 22 

E[Q(X)] = — Pr[X < éJe* /”? dé. 

[20x] = se fo Pax sale? ag 


(See (Verdt, 1998, Chapter 3, Section 3.3, Eq. (3.57)).) 
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Exercise 19.7 (Generating Gaussians from Uniform RVs). 
(i) Let Wi and W2 be IID NV(0,1), and let R = /W?+W2. Show that R has 
a Rayleigh distribution, i.e., that its density fr(r) is given for every r € R by 
2 
re 2 I{r > 0}. What is the CDF FrR(-) of R? 
(ii) Prove that if a RV X is of density fx(-) and of CDF F'x(-), then Fx (X) ~ U (0,1). 


(iii) Show that if U; and U2 are IID U (0, 1) and if we define R = ,/In oh and © = 21U2, 
then Rceos© and Rsin © are IID N(0, 1/2). 


Exercise 19.8 (Infinite Divisibility). Show that for any . € R and o? > 0 there exist IID 
RVs X and Y such that X + Y ~ N (1,07) . 


Exercise 19.9 (MGF of the Square of a Gaussian). Derive (19.32). 


Exercise 19.10 (The Distribution of the Magnitude). Show that if a random variable X 
is of density fx(-) and if Y = |X], then the density fy(-) of Y is 


fy (y) = (fx(y) + fx(—y)) y= 0}, yeR. 


Exercise 19.11 (Uniformly Distributed Random Variables). Suppose that X ~ U([0, 1). 


(i) Find the characteristic function ®x(-) of X. 
(ii) Show that if X and Y are independent with X as above, then X+Y is not Gaussian. 


Exercise 19.12 (Sums and Differences of IID RVs). Let X and Y be IID random variables 
with finite variances. Show that if X¥ + Y and X —Y are independent, then X and Y are 
Gaussian. 


(See (Feller, 1971, Chapter III, Section 4).) 


Chapter 20 


Binary Hypothesis Testing 


20.1 = Introduction 


In Digital Communications the task of the receiver is to observe the channel out- 
puts and to use these observations to accurately guess the data bits that were sent 
by the transmitter, i.e., the data bits that were fed to the modulator. Ideally, the 
guessing would be perfect, i.e., the receiver would make no errors. This, alas, is 
typically impossible because of the distortions and noise that the channel intro- 
duces. Indeed, while one can usually recover the data bits from the transmitted 
waveform (provided that the modulator is a one-to-one mapping), the receiver has 
no access to the transmitted waveform but only to the received waveform. And 
since the latter is typically a noisy version of the former, some errors are usually 
unavoidable. 


In this chapter we shall begin our study of how to guess intelligently, i.e., how, 
given the channel output, one should guess the data bits with as low a probability 
of error as possible. This study will help us not only in the design of receivers but 
also in the design of modulators that allow for reliable decoding from the channel’s 
output. 


In the engineering literature the process of guessing the data bits based on the 
channel output is called “decoding.” In the statistics literature this process is 
called “hypothesis testing.” We like “guessing” because it demystifies the process. 


In most applications the channel output is a continuous-time waveform and we seek 
to decode a large number of bits. Nevertheless, for pedagogical reasons, we shall 
begin our study with the simpler case where we wish to decode only a single data 
bit. This corresponds in the statistics literature to “binary hypothesis testing,” 
where the term “binary” reminds us that in this guessing problem there are only 
two alternatives. Moreover, we shall assume that the observation, rather than 
being a continuous-time waveform, is a vector or a scalar. In fact, we shall begin 
our study with the simplest case where there are no observations at all. 
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20.2 Problem Formulation 


In choosing a guessing strategy to minimize the probability of error, the labels 
of the two alternatives are immaterial. The principles that guide us in guessing 
the outcome of a fair coin toss (where the labels are “heads” or “tails”) are the 
same as for guessing the value of a random variable that takes on the values +1 
and —1 equiprobably. (These are, of course, extremely simple cases that can be 
handled with common sense.) Statisticians typically denote the two alternatives 
by Ho and #, and call them “hypotheses.” We shall denote the two alternatives 
by 0 and 1. We thus envision guessing the value of a random variable H taking 
value in the set {0,1} with probabilities 


mo = Pr[H = 0], mw = Pr[H = 1]. (20.1) 


The prior is the distribution of H or the pair (70,71). It reflects the state of our 
knowledge about H before having made any observations. We say that the prior 
is nondegenerate if 

To, 71 > 0. (20.2) 


(If the prior is degenerate, then H is deterministic and we can determine its value 
without any observation. For example if 7o = 0 we always guess 1 and never err.) 
The prior is uniform if 7 = 7, = 1/2. 


Aiding us in the guess work is the observation Y, which is a random vector taking 
value in R?. (When d = 1 the observation is a random variable and we denote it 
by Y.) We assume that Y is a column vector, so, using the notation of Section 17.2, 


xr. 


Typically there is some statistical dependence between Y and H; otherwise, Y 
would be useless. If the dependence is so strong that from Y one can deduce H, 
then our guess work is very easy: we simply compute from Y the value of H and 
declare the result as our guess; we never err. The cases of most interest to us 
are therefore those where Y neither determines H nor is statistically independent 
of H. Unless otherwise specified, we shall assume that, conditional on H = 0, 
the observation Y is of density fy|=0(-) and that, conditional on H = 1, it is of 
density fy|#=1(-). Here fy;#=o(-) and fy;=1(-) are nonnegative Borel measurable 
functions from R¢ to R that integrate to one.! 


Our problem is how to use the observation Y to intelligently guess the value of H. 
At first we shall limit ourselves to deterministic guessing rules. Later we shall 
show that no randomized guessing rule can outperform an optimal deterministic 
tule. A deterministic guessing rule (or decision rule , or decoding rule) for 
guessing H based on Y is a (Borel measurable) mapping from the set of possible 
observations R¢ to the set {0,1}. We denote such a mapping by 


bcuess: R4 = {0,1} (20.3) 


1Readers who are familiar with Measure Theory should note that these are densities with 
respect to the Lebesgue measure on R?, but that the reference measure is inessential to our 
analysis. We could have also chosen as our reference measure the sum of the probability measures 
on R¢ corresponding to H = 0 and to H = 1. This would have guaranteed the existence of the 
densities. 
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and say that dcuess(Yobs) is the guess we make after having observed that Y = yobs. 


The probability of error associated with the guessing rule dcuess(-) is 
Pr(error) = Pr[¢guess(Y) # H]. (20.4) 


Note that two sources of randomness determine whether the guessing rule dcuess(-) 
errs or not: the realization of H and the generation of Y conditional on that 
realization. We say that a guessing rule is optimal if no other guessing rule 
attains a smaller probability of error. (We shall later see that there always exists 
an optimal guessing rule.?) In general, there may be a number of different optimal 
guessing rules. We shall therefore try to refrain from speaking of the optimal 
guessing rule. We apologize if this results in cumbersome writing. The probability 
of error associated with optimal guessing rules is the optimal probability of 
error and is denoted throughout by 


p* (error). 


20.3 Guessing in the Absence of Observables 


We begin with the simplest case where there are no observables. Common sense 
dictates that in this case we should base our guess on the prior (79, 71) as follows. 
If m9 > 7, then we should guess that the value of H is 0; if mo < 7, then we 
should guess the value 1; and if 7m) = 7 = 1/2, then it does not really matter what 
we guess: the probability of error will be either way 1/2. 


To verify that this intuition is correct note that, since there are no observables, 
there are only two guessing rules: the rule “guess 0” and the rule “guess 1.” The 
former results in the probability of error 7, (it is in error whenever H = 1, which 
happens with probability 7), and the latter results in the probability of error 7. 
Hence the former rule is optimal if 79 > a, and the latter is optimal when 7, > 7. 
When 79 = 7 both rules are optimal and we can use either one. 


We summarize that, in the absence of observations, an optimal guessing rule is: 


(20.5) 


. 0 if Pri = 0] > Pr[H = 1, 
PGuess ae : 
1 otherwise. 


(Here we guess 0 also when Pr[H = 0] = Pr[H = 1]. An equally good rule would 
guess 1 in this case.) 


As we next show, the error probability p* (error) of this rule is 
p* (error) = min{Pr[H = 0], Pr[H = 1}}. (20.6) 


This can be verified by considering the case where Pr[H = 0] > Pr[H = 1] and the 
case where Pr|H = 0] < Pr[H = 1] separately. By (20.5), in the former case our 


2Thus, while there is no such thing as “smallest strictly positive number,” i.e., a positive 
number that is smaller-or-equal to any other positive number, we shall see that there always 
exists a guessing rule that no other guessing rule can outperform. Mathematicians paraphrase 
this by saying that “the infimum of the probability of error over all the guessing rules is achievable, 
i.e., is a minimum.” 
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guess is 0 with the associated probability of error Pr[H = 1], whereas in the latter 
case our guess is 1 with the associated probability of error Pr[H = 0]. In either 
case the probability of error is given by the RHS of (20.6). 


20.4 The Joint Law of H and Y 


Before we can extend the results of Section 20.3 to the more interesting case where 
we guess H after observing Y, we pause to discuss the joint distribution of H 
and Y. This joint distribution is needed in order to derive an optimal decision rule 
and in order to analyze its performance. Some care must be exercised in describing 
this law because H is discrete (binary) and Y has a density. It is usually simplest 
to describe the joint law by describing the prior (the distribution of H), and by 
then describing the conditional law of Y given H = 0 and the conditional law of Y 
given H = 1. 


If, conditional on H = 0, the distribution of Y has the density fy|q=o(-) and if, 
conditional on H = 1, the distribution of Y has the density fy,q=1(-), then the 
joint distribution of H and Y can be described using the prior (70,71) (20.1) and 
the conditional densities 


fyjH=o0(-) and fyj#=1(-). (20.7) 


From the prior (79,71) and the conditional densities fy)#=0(-), fyj#=1(-) we can 
compute the (unconditional) density of Y: 


fy(y) = mofyjn-o(y) + mfyjzaily), y eR’. (20.8) 


The conditional distribution of H given Y = yops is a bit more tricky because 
the probability of Y taking on the value yop, (exactly) is zero. There are two 
approaches to defining Pr[H = 0|Y = yobs] in this case: the heuristic one that is 
usually used in a first course on probability theory and the measure-theoretic one 
that was pioneered by Kolmogorov. Our approach is to define this quantity in a 
way that will be palatable to both mathematicians and engineers and to then give 
a heuristic justification for our definition. 


We define the conditional probability that H = 0 given Y = yops as 


To fy |H=0(Yobs) if 
Pr[H 0 | Y Yobs| A : FY (Yoos) 1 fy (Yobs) > 0, (20.9a) 
5 otherwise, 
where fy(-) is given in (20.8), and analogously 
T1f¥|H=1(Yobs) af 0 
Pr[H 1/Y Yons| A , fy (Yovs) 1 fy (Yobs) > VY, (20.9b) 
5 otherwise. 


Notice that our definition is meaningful in the sense that the values we assign to 
Pr[H = 0| Y = yobs] and Pr[H = 1] Y = yops] are nonnegative and sum to one: 


Prt =0/'¥ =yopale Pele = 1/¥ = Yor = 1 Yous eR (20.10) 
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Also note that our definition of Pr[H = 0|Y = yops] and Pr[H = 1]Y = yobs] 
for those yops € R®% for which fy(yovs) = 0 is quite arbitrary; we chose 1/2 just 
for concreteness.? Indeed, it is not difficult to verify that the probability that yops 
satisfies 7 fy|H=0(Yobs) + ™1f¥|H#=1(Yobs) = 0 is zero, and hence our definitions in 
this eventuality are not important; see (20.12) ahead. 


If d = 1, then the observation is a random variable Y and a heuristic way to 
motivate (20.9a) is to consider the limit 


Pele = 0, Ye (Gets =. 6, Yobs an 5)] 
810 Pr[Y € (Yobs — 9, Yoos + 6)] 


Assuming some regularity of the conditional densities (e.g., continuity) we can use 
the approximations 


(20.11) 


Yobs +O 
Pr/H =0,Y € (Yobs — 5, Yoos + 5)| =" f fyjH=o(y) dy 
y 


obs —5 


= mod fy|H=0(Yoos), FO <1, 


Yobs +6 
PrlY € (yous — 5, Yoos + 5)] = ) fy (y) dy 


Yobs—5 
~ 26 fy (Yoos); 6 < 1, 


to argue that, under suitable regularity conditions, (20.11) agrees with the RHS of 
(20.9a) when fy(Yyors) > 0. A similar calculation can be carried out in the vector 
case where d > 1. 


We next remark on observations yops at which the density of Y is zero. Accounting 
for such observations makes the writing a bit cumbersome as in (20.9). Fortunately, 
the probability of such observations is zero: 


Note 20.4.1. Let H be drawn according to the prior (79,71), and let the con- 
ditional densities of Y given H be fyjy=o0(-) and fyj#=1(-) with fy(-) given in 
(20.8). Then 


Pr[Y € {¥ ER*: fy(¥) =0}] =0. (20.12) 
Proof. 
Pr[y € {¥ ER*: fy(¥) =0}] = fy(y) dy 
{yER*: fy (y)=0} 
= Ody 
{¥ER*: fy (¥)=0} 


0, 


where the second equality follows because the integrand is zero over the range of 
integration. 


3In the measure-theoretic probability literature our definition is just a “version” (among many 
others) of the conditional probabilities of the event H = 0 (respectively H = 1), conditional on 
the o-algebra generated by the random vector Y (Billingsley, 1995, Section 33), (Williams, 1991, 
Chapter 9). 
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We conclude this section with two technical remarks which are trivial if you ignore 
observations where fy(-) is zero: 


Note 20.4.2. Consider the setup of Note 20.4.1. 


(i) For every y € R¢ 


min{ mo fy|H=0(y), m1 fy|H=1(y) } 
= min{Pr[H = 0|Y¥ = y],Pr[H =1|/Y =y]} fy(y). (20.13) 


(ii) For every y € R@ 


Tofy|H=o0(y) = 1 fy|H=1(y) 
( 


S (Pri =0|/Y¥ =y]>Pr[H# =1/Y = yl). (20.14) 


Proof. Identity (20.13) can be proved using (20.9) and (20.8) by separately con- 
sidering the case fy(y) > 0 and the case fy(y) = 0 (where the latter is equivalent, 
by (20.8), to to fy|H#=0(y) and 71 fy|=1(y) both being zero). 

To prove (20.14) we also separately consider the case fy(y) > 0 and the case 
fy(y) = 0. In the former case we note that for c > 0 the condition a > b is 
equivalent to the condition a/c > b/c so for fy(Yops) > 0 


(mofv|n-0(¥) > m fyin—1(¥)) os Cee = alee) 


Fr. 2 — Fy) 
—SSS as 
Pr[H=0|Y=y] Pr[H=1|Y=y] 


In the latter case where fy(y) = 0 we note that, by (20.8), both mo fyjH#=o0(y) 
and 71 fy|#=1(y) are zero, so the condition on the LHS of (20.14) is true (0 > 0). 
Fortunately, when fy(y) = 0 the condition on the RHS of (20.14) is also true, 
because in this case (20.9) implies that Pr/H = 0|Y = y] and Pr[H =1|Y =y 
are both equal to 1/2 (and 1/2 > 1/2). 
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We next derive an optimal rule for guessing H after observing that Y = yobs. 
We begin with a heuristic argument. Having observed that Y = yobs, there are 
only two possible decision rules: to guess 0 or guess 1. Which should we choose? 
The answer now depends on the a posteriori distribution of H. Once it has been 
revealed to us that Y = yops, our outlook changes and we now assign the event 
H =0 the a posteriori probability Pr[H = 0|Y = yops| and the event H = 1 the 
complementary probability Pr[H = 1| Y = yops]. If the former is greater than the 
latter, then we should guess 0, and otherwise we should guess 1. Thus, after it has 
been revealed to us that Y = yops the situation is equivalent to one in which we 
need to guess H without any observables and where our distribution on H is not 
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its a priori distribution (prior) but its a posteriori distribution. Using our analysis 
from Section 20.3 we conclude that the guessing rule 


0 if PriH =0|Y = yops] > Pr[H = 1]Y = yops], 


: (20.15) 
1 otherwise, 


PGuess(Yobs) = 


is optimal. Once again, the way we resolve ties is arbitrary: if the observation 
Y = yops results in the a posteriori distribution of H being uniform, that is, if 
Pr[H = 0/Y = yobs] = Pr[H = 1] Y = yons] = 1/2, then either guess is optimal. 
Using Note 20.4.2 (ii) we can also express the decision rule (20.15) as 


if = obs = = obs/5 
Ce 0 i Tofyin=o¥ bs) = ™1fy|z=1(Yobs) (20.16) 
1 otherwise. 


Conditional on Y = yops, the probability of error of the optimal decision rule is, 
in analogy to (20.6), given by 


p*(error|Y = yoos) = min{Pr[H = 0|Y = yons], Pr[H = 1|Y = yos]}, (20.17) 


as can be seen by treating the case Pr[H = 0] Y = yons] > Pr[H = 1]Y = yons] and 
the complementary case Pr[H = 0|Y = yons] < Pr[H = 1|Y = yous] separately. 


The unconditional probability of error associated with the rule (20.15) is thus 
p* (error) = E[min{Pr[H = 0| Y], Pr[H = 1|Y]}] (20.18) 


2 I, min{Pr[H = 0|¥ = y],Pr[H =1/¥ =yl}fy(y)dy (20.19) 
= fin {mo Feroly)om feiraly)} ay. (20.20) 


where the last equality follows from Note 20.4.2 (i). 


Before summarizing these conclusions in a theorem, we present the following simple 
lemma on the probabilities of error associated with general decision rules. 


Lemma 20.5.1. Consider the setup of Note 20.4.1. Let dcuess(-) be an arbi- 
trary guessing rule as in (20.3). Then the probabilities of error p(error|H = 0), 
p(error|H = 1), and p(error) associated with dcuess(+) are given by 


p(error|H = 0) = / ap feta0l) (20.21) 
y 


plerror| =1)= f fyinaa(v)ay. (20.22) 
yeEeD 
and 


p(error) = ke (mofv|-0(¥) I{ty €D} + mfyjz=il(y) Hy € P}) dy, (20.23) 


where 
D= {y € R¢ : Guess (Y) = O}. (20.24) 
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Proof. Conditional on H = 0 the guessing rule makes an error only if Y does not 
fall in the set of observations for which ¢cuess(-) produces the guess “H = 0.” This 
establishes (20.21). A similar argument proves (20.22). Finally, (20.23) follows 
from (20.21) & (20.22) using the identity 


p(error) = 1 p(error|H = 0) + 7 p(error|H = 1). 


We next state the key result about binary hypothesis testing. The statement is a 
bit cumbersome because, in general, there may be many observations that result 
in H being a posteriori uniformly distributed, and an optimal decision rule can 
map each such observation to a different guess and still be optimal. 


Theorem 20.5.2 (Optimal Binary Hypothesis Testing). Suppose that a guessing 
rule Panes: R¢ — {0,1} produces the guess “H = 0” only when Yops is such that 


Tofy|H=0(Yobs) = Ti fy|H=1(Yobs), t-€-, 


(Péuess(Yoos) = 0) > (ofvj1=0(¥oos) 2 ™ fy|n=1 (Yous) ): (20.25a) 


and produces the guess “H = 1” only when 1 fy|H=1(Yoos) = Tofy\H=0(Yobs); 
1.€., 


(déness(Yoos) = 1) > (mfyjn=1 (Yous) = ofy|H=0(¥os) ): (20.25b) 


Then no other guessing rule has a smaller probability of error, and 
Pr[ dGuess(Y) # H] = is min{ 70 fy|#=0(y), ™ fy|H#=1(y) } dy. (20.26) 
R 


Proof. Let Guess: R? — {0,1} be any guessing rule, and let 
D= {y € R? : Guess (Y) = 0} (20.27) 


be the set of observations that result in dcuess(-) producing the guess “H = 0.” 
Then the probability of error associated with dGuess(-) can be lower-bounded by 


Pr[dcuess(Y) # H] = 1g (70fv|11-0() ly €D} + mfyjz=1(y) Hy € D}) dy 
> | min{ 7 fy|z=0(y), tfy|H=1(y) } dy, (20.28) 
Ra 


where the equality follows from Lemma 20.5.1 and where the inequality follows 
because for every value of y € R¢ 


To fyju=oly) Hy € D} + m fyjz=i(y) Hy € D} 
> min{mofyjz=0(y), mfyjz=i(y)}, (20.29) 


as can be verified by noting that, irrespective of the set D, one of the two terms 
I{y € D} and I{y ¢ D} is equal to one and the other is equal to zero, so the LHS of 
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(20.29) is either equal to 70 fy|H=o0(y) or to 7 fy;H#=1(y) and hence lower-bounded 
by min{70 fy|H=0(y), 71 fy|H=1(y)}- 
We prove the optimality of $6,,.,,(-) by next showing that the probability of error 
associated with $6, josg(-) is equal to the RHS of (20.28). To this end we define 


D* = {y ER*: $6 uess(¥) = O} (20.30) 


and note that if both (20.25a) and (20.25b) hold, then 


nofyju=o(y) ty € D*} + m fyjz=i1(y) {ty € D*} 
= min{m0fy|n=0(y),mfyjzai(y)}, y ER (20.31) 


Applying Lemma 20.5.1 to the decoder $éyec(-) we obtain 
Pr| dGuess(Y) Fa H] a8 (mofv|n-0(¥) I{y ¢ Dy ae m™ fy|H=1(y) Ity € D*}) dy 
R 


=f, min{ mo fy|H=0(y), ™fy|H=1(y) } dy, (20.32) 


where the second equality follows from (20.31). The theorem now follows from 
(20.28) and (20.32). 


Referring to a situation where the observation results in the a posteriori distribu- 
tion of H being uniform as a tie we have: 


Note 20.5.3. The fact that both conditional on H = 0 and conditional on H = 1 
the observation Y has a density does not imply that the probability of a tie is zero. 


For example, if H takes value in {0,1} equiprobably, and if the observation Y is 
given by Y = H+U, where U is uniformly distributed over the interval [—2, 2] 
independently of H, then the a posteriori distribution of H is uniform whenever 
Y € [-1,2], and this occurs with probability 3/4. 


20.6 Randomized Decision Rules 


So far we have restricted ourselves to deterministic decision rules, where the guess 
is a deterministic function of the observation. We next remove this restriction and 
allow for some randomization in the decision rule. As we shall see in this section 
and in greater generality in Section 20.11, when properly defined, randomization 
does not help: the lowest probability of error that is achievable with randomized 
decision rules can also be achieved with deterministic decision rules. 


By a randomized decision rule we mean that, after observing that Y = yobs, the 
guesser chooses some bias b(yops) € [0,1] and then tosses a coin of that bias. 
If the result is “heads” it guesses 0 and otherwise it guesses 1. Note that the 
deterministic rules we have considered before are special cases of the randomized 
ones: any deterministic decision rule can be viewed as a randomized decision rule 
where, depending on yobs, the bias (yobs) is either zero or one. 
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Bias 
Calculator © > b(yors) > “H = 1” 


Random Number 
Generator 


Figure 20.1: A block diagram of a randomized decision rule. 


Some care must be exercised in defining the joint distribution of the coin toss with 
the other variables (H, Y). We do not want to allow for “telepathic coins.” That is, 
we want to make sure that once Y = yops has been observed and the bias b(yobs) 
has been accordingly computed, the outcome of the coin toss is random, i.e., has 
nothing to do with H. Probabilists would say that we require that, conditional on 
Y = Yobs, the outcome of the coin toss be independent of H. (We shall discuss 
conditional independence in Section 20.11.) We can clarify the setting as follows. 
Upon observing the outcome Y = yobs, the guesser computes the bias b(yobs). 
Using a local random number generator the guesser then draws a random variable O 
uniformly over the interval [0, 1], independently of the pair (H, Y). If the outcome 0 
is smaller than b(yops), then it guesses “H = 0,” and otherwise it guesses “H = 1.” 
A randomized decision rule is depicted in Figure 20.1. 


We offer two proofs that randomized decision rules cannot outperform the best 
deterministic ones. The first is by straightforward calculation. Conditional on 
Y = yobs, the randomized guesser makes an error either if O < b(yops) (resulting 
in the guess “H = 0”) while H = 1, or if O > b(yops) (resulting in the guess 
“H = 1”) while H = 0. Consequently, 


Pr(error | Y= Yobs) 
= b(Yobs) Pr[H = 1] ¥ = yous] + (1 — 0(yons)) Pr[H =0|Y =yons]. (20.33) 


Thus, Pr(error|Y = yops) is a weighted average of Pr[H = 0|/Y = yobs] and 
Pr[H = 1|Y = yops]. As such, irrespective of the weights, it cannot be smaller 
than the minimum of the two. But, by (20.17), the optimal deterministic decision 
rule (20.15) achieves just this minimum. We conclude that, irrespective of the bias, 
for each outcome Y = yops the conditional probability of error of the randomized 
decoder is lower-bounded by that of the optimal deterministic decoder (20.15). 
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Since this is the case for every outcome, it must also be the case when we average 
over the outcomes. This concludes the first proof. 
In the second proof we view the outcome of the local random number generator O 


as an additional observation. Since it is independent of (H,Y) and since it is 
uniform over [0, 1], 


fy,ejH=0(y, 9) _ fy|H=0(y) foly=y,n=0(9) 
= fy|H=0(y) fo(?) 
= fy|n=oly) {0 < @ < 1}, (20.34a) 


and similarly 


fy,ojH=1(¥,9) = fyjnaily) HO < 8 < 1}. (20.34b) 


Since the randomized decision rule can be viewed as a deterministic decision 
rule that is based on the pair (Y,9), it cannot outperform any optimal de- 
terministic guessing rule based on (Y,90). But by Theorem 20.5.2 and (20.34) 
it follows that the deterministic decision rule that guesses “H = 0” whenever 
Tofy|H=ol(y) = ™ fy|H=1(y) is optimal not only for guessing H based on Y but 
also for guessing H based on (Y,9), because it produces the guess “H = 0” only 
when to fy,ejH=0(y,9) > m1 fy,ejH=1(y,9) and it produces the guess “H = 1” 
only when 7 fy,ejH#=1(y, 9) = t0fy,e;#=0(y, 9). This concludes the second proof. 


Even though randomized decision rules cannot outperform the best deterministic 
rules, they may have other advantages. For example, they allow for more symmetric 
ways of resolving ties. Suppose, for example, that we have no observations and that 
the prior is uniform. In this case guessing “H = 0” will give rise to a probability of 
error of 1/2, with an error occurring whenever H = 1. Similarly guessing “H = 1” 
will also result in a probability of error of 1/2, this time with an error occurring 
whenever H = 0. If we think about H as being an information bit, then the former 
rule makes sending 0 less error prone than sending 1. A randomized test that flips 
a fair coin and guesses 0 if “heads” and 1 if “tails” gives rise to the same average 
probability of error (i-e., 1/2) and makes sending 0 and sending 1 equally (highly) 
error prone. 


If Y = yops results in a tie, i.e., if it yields a uniform a posteriori distribution 
on H, 


Pr 0) Y= yop, |S Pe a a1 ¥ = you, 


1 

9 ’ 

then the probability of error of the randomized decoder (20.33) does not depend on 
the bias. In this case there is thus no loss in optimality in choosing b(yops) = 1/2, 
i.e., by employing a fair coin. This makes for a symmetric way of resolving the tie 
in the a posteriori distribution of H. 


20.7 The MAP Decision Rule 


In Section 20.5 we presented an optimal decision rule (20.15). A slight variation 
on that decoder is the Maximum A Posteriori (MAP) decision rule. The MAP 
rule is identical to (20.15) except in how it resolves ties. Unlike (20.15), which 
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resolves ties by guessing “H = 0,” the MAP rule resolves ties by flipping a fair 
coin. It can thus be summarized as follows: 


0 if Pr[H = 0|Y = yops] > Prl[H = 1|Y = yobs], 
émaP(Yobs) = ¢ 1 if Pr[H = 0|Y = yoos] < Pr[H = 1]Y = yots], 
U({0,1}) if PrLH =0|Y = yobs) = Pr[H = 1|Y = yors], 


(20.35) 
where we use “U({0,1})” to indicate that we guess the outcome uniformly at 
random. 


Note that, like the rule in (20.15), the MAP rule is optimal. This follows because 
the way ties are resolved does not influence the probability of error, and because 
the MAP rule agrees with the rule (20.15) for all observations which do not result 
in a tie. 


Theorem 20.7.1 (The MAP Rule Is Optimal). The Maximum A Posteriori deci- 
sion rule (20.35) is optimal. 


Since the MAP decoder is optimal, 
p* (error) = 7 pmap(error|H = 0) + 7 pmap(error|H = 1), (20.36) 


where pmap(error|H = 0) and pwap(error|H = 1) denote the conditional prob- 
abilities of error for the MAP decoder. Note that one can easily find guessing 
rules (such as the rule “always guess 0”) that yield a conditional probability of 
error smaller than pyap(error|H = 0), but one cannot find a rule whose average 
probability of error outperforms the RHS of (20.36). 


Using Note 20.4.2 (ii) we can express the MAP rule in terms of the densities and 
the prior as 


0 if to fy|H=0(Yobs) > T1f¥|H=1(Yobs); 
@MAP(Yobs) = 4 1 if tofy|H=0(Yoos) < 1 fy|H=1(Yobs); (20.37) 
U({0, 1}) if To fy|H=0(Yobs) = 1 fy |t=1(Yobs)- 


Alternatively, the MAP decision rule can be described using the likelihood-ratio 
function LR(-), which is defined by 


4 fy|H=0(y) 
fy|H=1(y) 


LR(y) , yeR? (20.38) 


using the convention 
= 1. (20.39) 
Since densities are nonnegative, and since we are defining the likelihood-ratio func- 


tion using the convention (20.39), the range of LR(-) is the set [0, oo] consisting of 
the nonnegative reals and the special symbol oo: 


LR: R? = [0, oo]. 
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Using the likelihood-ratio function and (20.37), we can rewrite the MAP rule for 
the case where the prior is nondegenerate (20.2) and where the observation yobs is 
such that fy(yYops) > 0 as 


0 if LR(yops) > =, 

omAP(Yobs) = 4 1 if LR(yous) << =, (70,71 >0, fy(Yovs) > 0). 
Uu({0, 1}) if LR(Yobs) Tae 

(20.40) 


Since many of the densities that are of interest to us have an exponential form, it 
is sometimes more convenient to describe the MAP rule using the log likelihood- 
ratio function LLR: R¢ — [—ov, co], which is defined by 


fy\H=o(y) 
fy|H=1 (y)’ 


LLR(y) = In R¢, (20.41) 


using the convention 
(In Shei eas 0) and he Se (20.42) 
0 a 0 


where In(-) denotes natural logarithm. 


Using the log likelihood-ratio function LLR(-) and the monotonicity of the loga- 
rithmic function 


(a>b)<@(ma>Inb), a,b>0, (20.43) 
we can express the MAP rule (20.40) as 
0 if LLR(yobs) > In 22, 
@MAP(Yobs) = 4 1 if LLR(yops) < In &, (0,71 >0, fy(Yoos) > 0). 


U({0,1}) if LLR(yors) = In B, 
(20.44) 


20.8 The ML Decision Rule 


A different decision rule, which is typically suboptimal unless H is a priori uniform, 
is the Maximum-Likelihood (ML) decision rule. Its structure is similar to that 
of the MAP rule except that it ignores the prior. In fact, if 79 = 71, then the two 
rules are identical. The ML rule is thus given by 


0 if fy|H=0(Yobs) > fy|H=1(Yobs), 
émi(Yoos) = 4 1 if fy|H=o0(Yoos) < fy|H=1(Yobs), (20.45) 
Uu({0, 1}) if fy|H=0(Yobs) = fy|H=1(Yobs): 


The ML decision rule can be alternatively described using the likelihood-ratio func- 
tion LR(-) (20.38) as 


0 if LR(Yobs) > 1, 
omL(Yovs) = 4 1 if LR(Yobs) < 1, (20.46) 
U({0,1}) if LR(yops) = 1. 
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Alternatively, using the log likelihood-ratio function LLR(-) (20.41): 


0 if LLR(yops) > 0, 
émL(Yobs) = 4 1 if LLR(yops) < 0, (20.47) 
U({0,1}) if LLR(yons) = 0. 


20.9 Performance Analysis: the Bhattacharyya Bound 
We next derive the Bhattacharyya Bound, which is a useful upper bound on 
the optimal probability of error p* (error). 


Starting with the exact expression (20.20) we obtain: 


p (error) = ie min {7 fy|#=0(y), ™1fyv|H=1(y)} dy 


< a \Tofvinaoly)m fyjn=i(¥) dy 
= VT071 ih \ fvin=oly) Fyjn—i(y) dy 
< 5h. \ fvin=oly) Fyj7—a(y) dy, 


where the equality in the first line follows from (20.20); the inequality in the second 
line from the inequality 


min{a,b} < Vab, a,b>0, (20.48) 


(which can be easily verified by treating the case a > b and the case a < b sepa- 
rately); the equality in the third line by trivial algebra; and where the inequality 
in the fourth line follows by noting that if c,d > 0, then their geometric mean Vcd 
cannot exceed their arithmetic mean (c + d)/2, i-e., 


d 
Ved < S . ¢d>0, (20.49) 


and because in our case c= 7 and d= 71, soc+d=1. 
We have thus established the bound 


p* (error) < nie \/fyin=0()Fyin=(y) dy, (20.50) 


which is known as the Bhattacharyya Bound. 


20.10 Example 


Consider the problem of guessing H based on the observation Y, where H takes 
on the values 0 and 1 equiprobably and where the conditional densities of Y given 
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H=Oand H =1 are 


1 ~(y—A)? / (262 

fy\n=ol¥) = Tose (y-Ay"/20"), YER, (20.51a) 
1 2 2 

fy\a=i(y) = B Se EO, ye (20.51b) 
TO 


for some deterministic A,o > 0. Here the observable is a RV, so d= 1. 


For these conditional densities the likelihood-ratio function (20.38) is given by: 


Gphig €- WAV? (20?) 
2770 
Lee UtAy*/Q07) 


= etyA/(20*) 


yER. 


Since the two hypotheses are a priori equally likely, the MAP rule is equivalent to 
the ML rule and both rules guess “H = 0” or “H = 1” depending on whether the 
likelihood-ratio LR(yops) is greater or smaller than one. And since 
LR(Yops) > 1 etvorA/2e") 5 1 
> In (eserniGe’) >In1 
& 4yopsA/(207) > 0 
=> Yobs > 0, 


and 


LR(Yos) <1ls et¥ovsA/(20*) <1 
= In (neice) <Inl 


& AyopsA/(207) <0 
= Yobs < 0, 


it follows that the MAP decision rule guesses “H = 0,” if yops > 0; guesses “H = 1,” 
if Yyoos < 0; and guesses “H = 0” or “H = 1” equiprobably, if yop, = 0 (i-e., in the 
case of a tie). 


Note that in this example the probability of a tie is zero. Indeed, under both 
hypotheses, the probability that the observed variable Y is exactly equal to zero is 
Zero: 


Prey 20) SO) =Pr =o] 2 Sal Pry 20). 0, (20.52) 


Consequently, the way ties are resolved is immaterial. 


We next compute the probability of error of the MAP decoder. To this end, let 
pmap(error|H = 0) and pmap(error|H = 1) denote its conditional probabilities of 
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error. Its (unconditional) probability of error, which is also the optimal probability 
of error, can be expressed as 


p* (error) = 7 pmap(error|H = 0) +7 pmap(error|H = 1). (20.53) 


We proceed to compute the required terms on the RHS. Starting with the term 
pmap(error|H = 0), we note that pyap(error|H = 0) corresponds to the condi- 
tional probability that Y is negative or that Y is equal to zero and the coin toss 
that the MAP decoder uses to resolve the tie causes the guess to be “H = 1.” By 
(20.52), the conditional probability of a tie is zero, so pyap(error|H = 0) is, in 
fact, just the conditional probability that Y is negative: 


pmap(error|H = 0) = Pr[Y < 0|H =0] 
A 
= o(=), (20.54) 


where the second equality follows because, conditional on H = 0, the random 
variable Y is NV (A, ay, and the probability that it is smaller than zero can be thus 
computed using the Q-function as in (19.12b). Similarly, 


pmap(error|H = 1) = Pr[Y > 0|H = 1] 
A 
=9(=). (20.55) 


Note that in this example the MAP rule is “fair” in the sense that the conditional 
probability of error given H = 0 is the same as given H = 1. This is a coincidence 
(that results from the symmetry in the problem). In general, the MAP rule need 
not be fair. 
We conclude from (20.53), (20.54), and (20.55) that 
, A 
p* (error) = o(=). (20.56) 
a 


Figure 20.2 depicts the conditional densities of y given H = 0 and given H = 1 
and the decision regions of the MAP decision rule dyap(-). The area of the shaded 
region is the probability of an error conditioned on H = 0. 


Note that the optimal decision rule for this example is not unique. Another optimal 
decision rule is to guess “H = 0” if yops is positive but not equal to 17, and to 
guess “H = 1” otherwise. 


Even though we have an exact expression for the probability of error (20.56) it is 
instructive to compute the Bhattacharyya Bound too: 


p’ (error) < sf. /frinqo(y) Fyn) dy 


— | as deer e-(y—-A)?/(207) oo e—(ytA)?/(20?) dy 
2d e5 V2? V2ne2 


en [207 dy 


Dol rR 
| 
> 
iw) 
>: 
bo 
Q 
iw) 

iar 

QB 

i) 

3 || = 

Qq 

bo 


eee, (20.57) 
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Guess “H = 1” ~ > Guess “H = 0” 


fy|n=1(y) 


pmap(error|H = 0) 


—A A 


Figure 20.2: Binary hypothesis testing with a uniform prior. Conditional on H = 0 
the observable Y is N(A, a”) and conditional on H = 1 it is N(-A, a) The area 
of the shaded region is the probability of error of the MAP rule conditional on 
H=0. 


where the first line follows from (20.50); the second from (20.51); the third by simple 
algebra; and the final equality because the Gaussian density (like all densities) 
integrates to one. 


As an aside, we have from (20.57) and (20.56) the bound 


Q(a)< =e? /?, a >0, (20.58) 


Nile 


which we encountered in Proposition 19.4.2. 


20.11 (Nontelepathic) Processing 


To further emphasize the optimality of the Maximum A Posteriori decision rule, 
and for ulterior motives that have to do with the introduction of conditional inde- 
pendence, we shall next show that no processing of the observables can reduce the 
probability of a guessing error. To that end we shall have to properly define what 
we mean by “processing.” 

The first thing that comes to mind is to consider processing as the application of 
some deterministic mapping. I.e., we think of mapping the observation yops using 
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Guess H based 


on g(Yobs) 


Figure 20.3: No decision rule based on g(yops) can outperform an optimal decision 
tule based on yobs, because computing g(yops) and then forming the decision based 
on the answer can be viewed as a special case of guessing based on yobs. 


some deterministic function g(-) to g(yops) and then guessing H based on g(Yobs)- 
That this cannot reduce the probability of error is clear from Figure 20.3, which 
demonstrates that mapping Yops to g(Yops) and then guessing H based on g(yYobs) 
can be viewed as a special case of guessing H based on yops and, as such, cannot 
outperform the MAP decision rule, which is optimal among all decision rules based 
ON Yobs: 


A more general kind of processing involves randomization, or “dithering.” Here we 
envision the processor as using a local random number generator to generate a ran- 
dom variable © and then producing an output of the form g(yobs, Gobs),; Where Oobs 
is the outcome of ©, and where g(-) is some deterministic function. Here © is 
assumed to be independent of the pair (H,Y), so the processor can generate it 
using a local random number generator. 


An argument very similar to the one we used in Section 20.6 (in the second proof of 
the claim that randomized decision rules cannot outperform optimal deterministic 
rules) can be used to show that this type of processing cannot improve our guessing. 
The argument is as follows. We view the application of the function g(-) to the 
pair (Y, 0) as deterministic processing of the pair (Y, ©), so no decision rule based 
on g(Y,9) can outperform a decision rule that is optimal for guessing H based 
on (Y,90). It thus remains to show that the decision rule ‘Guess “H = 0” if 
To fy|H=0(Yobs) = 1 fy|H=1(Yobs)’ is also optimal when observing (Y, ©) and not 
only Y. This follows from Theorem 20.5.2 by noting that the independence of O 
and (H,Y), implies that 


fy,e|H=0(Yobs; Pobs) = fy|H=0(Yovs) fo (Pods); 


fy ,o|H#=1(Yobs, 9obs) = fy|#=1(Yobs) fo (obs); 
and hence that this rule guesses “H = 0” only when yops and Apps are such that 
Tofy,e|H=0(Yobs; Fobs) > T1fy,e|H=1(Yobs, Fops) and guesses “H = 1” only when 
m7 fy,e|H=1(Yobs; Fobs) = To fy ,e|H=0(Yobs, Fobs)- 


Fearless readers who are not afraid to divide by zero should note that 


fy ,o|H=0(Yobs, obs) 


LR fo) 3; Io s) = 
Weta obe) fy ,e|H=1(Yobs, Fobs) 
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= FriealFom)’ 10(Pobs) #0 


= LR(Yobs), fo(Aobs) # 0, 


so (ignoring some technical issues) the MAP detector based on (Yobs,9obs) ig- 
nores Oops and is identical to the MAP detector based on yobs only.* 


Ostensibly more general is processing Y by mapping it to g(Y,0), where the 
distribution of © is allowed to depend on yops. This motivates us to further extend 
the notion of processing. The cleanest way to define processing is to define its 
outcome rather than the way it is generated. 


Before defining processing we remind the reader of the notion of conditional inde- 
pendence. But first we recall the definition of (unconditional) independence. We 
do so for discrete random variables using their Probability Mass Function (PMF). 
The extension to random variables with a joint density is straightforward. For the 
definition of independence in more general scenarios see, for example, (Billingsley, 
1995, Section 20) or (Loéve, 1963, Section 15) or (Williams, 1991, Chapter 4). 


Definition 20.11.1 (Independent Discrete Random Variables). We say that the 
discrete random variables X and Y of joint PMF Px y(-,-) and marginal PMFs 
Px(-) and Py(-) are independent if Px y(-,-) factors as 


Px y (x,y) = Px (x) Py(y). (20.59) 


Equivalently, X and Y are independent if, for every outcome y such that Py (y) > 0, 
the conditional distribution of X given Y = y is the same as its unconditional 
distribution: 

Pxyy (aly) = Px(), Py(y) > 0. (20.60) 
Equivalently, X and Y are independent if, for every outcome x such that Px (x) > 0, 
the conditional distribution of Y given X = x is the same as its unconditional 
distribution: 

Py|x(y|x) = Py(y), Px(x) > 0. (20.61) 


The equivalence of (20.59) and (20.60) follows because, by the definition of the 
conditional probability mass function, 


P. z, 
Pair (aly) = 72) p(y) > 0. 
Py(y) 
Similarly, the equivalence of (20.59) and (20.61) follows from 
P. x, 
Pyix(yla) = “KEW y(n) > 0. 
Px (x) 


The beauty of (20.59) is that it is symmetric in X,Y. It makes it clear that X 
and Y are independent if, and only if, Y and X are independent. This is not 
obvious from (20.60) or (20.61). 


4Technical issues arise when the outcome of ©, namely O,ps, is such that fo (Oops) = 0. 
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The definition of the conditional independence of X and Y given Z is similar, except 
that we condition everywhere on Z. Again we only consider the discrete case and 
refer the reader to (Loéve, 1963, Section 25.3) and (Chung, 2001, Section 9.2) for 
the general case. 


Definition 20.11.2 (Conditionally Independent Discrete Random Variables). Let 
the discrete random variables X,Y,Z have a joint PMF Px y,z(-,-,-). We say that 
X and Y are conditionally independent given Z and write 


of 
Px y\z(2, ylz) = Pxiz(alz)Pyjz(ylz), Pz (z) > 0. (20.62) 


Equivalently, X and Y are conditionally independent given Z if, for any outcome 
y,z with Pyz(y,z) > 0, the conditional distribution of X given that Y = y and 
Z = z is the same as the distribution of X when conditioned on Z = z only: 


Pxvy,z(aly, 2) = Pxjz(2|z), Py,z(y,z) > 0. (20.63) 
Or, equivalently, X and Y are conditionally independent given Z if 
Py|x,z(ylz,2) = Pyjz(ylz), Px,z(x,2z) > 0. (20.64) 


The equivalence of (20.62) and (20.63) follows because, by the definition of the 
conditional probability mass function, 
Px,y,z(£,Y; 2) 


Pxyy,z (aly, 2) = Py zly 2) 


and similarly the equivalence of (20.62) and (20.64) follows from 
Px y\z(2, yz) 
Pxyz(alz) 


Again, the beauty of (20.62) is that it is symmetric in X,Y. Thus ¥ --—Z—»_Y¥ if, 
and only if, Y——Z—o—X. When X and Y are conditionally independent given Z 
we sometimes say that X—o—Z—o—Y forms a Markov chain. 


Py\x,z(y|z, 2) = Px,z(, 2) > 0. 


The equivalence between the different definitions of conditional independence con- 
tinues to hold in the general case where the random variables are not necessarily 
discrete. We only reluctantly state this as a theorem, because we never defined 
conditional independence in nondiscrete settings. 


Theorem 20.11.3 (Equivalent Definition for Conditional Independence). Let X, 
Y, and Z be random vectors. Then the following statements are equivalent: 
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(a) X and Y are conditionally independent given Z. 


(b) The conditional distribution of Y given (X,Z) is equal to its conditional 
distribution given Z. 


(c) The conditional distribution of X given (Z,Y) is equal to its conditional 
distribution given Z. 


Proof. For a precise definition of concepts appearing in this theorem and for a 
proof of the equivalence between the statements see (Loéve, 1963, Section 25.3) 
and particularly Theorem 25.3A therein. 


We are now ready to define the processing of the observation Y with respect to 
the hypothesis H. 


Definition 20.11.4 (Processing). We say that Z is the result of processing Y 
with respect to H if H and Z are conditionally independent given Y. 


As we next show, this definition of processing extends the previous ones. We 
first show that if Z = g(Y) for some deterministic Borel measurable function g(-) 
then H—-o-—Y—c—g(Y). This follows by noting that, conditional on Y, the random 
variable g(Y) is deterministic and hence independent of everything and a fortiori 
of H. 

We next show that if © is independent of (H,Y), then H-—Y-——g(Y,0). In- 
deed, if Z = g(Y,©) with © being independent of (Y, H), then, conditionally on 
Y =y, the distribution of Z is simply the distribution of g(y,0©) so (under this 
conditioning) Z is independent of H. 


We next show that processing the observables cannot help decrease the probability 
of error. The proof is conceptually very simple; the neat part is in the definition. 


Theorem 20.11.5 (Processing Is Futile). If Z is the result of processing Y with 
respect to H, then no rule for guessing H based on Z can outperform an optimal 
guessing rule based on Y. 


Proof. Surely no decision rule that guesses H based on Z can outperform an 
optimal decision rule based on Z, let alone outperform a decision rule that is 
optimal for guessing H based on Z and Y. But an optimal decision rule based on 
the pair (Z, Y) is the MAP rule, which compares 


Pr[H# =0|Y=y,Z=z] and PrfH=1|/Y=y,Z=z]. 


And, because H--—Y —o—Z, it follows from Theorem 20.11.3 that this is equivalent 
to comparing 
Pr[H =0|Y=y] and Pr/H=1/Y=y] 


ie., to an optimal (MAP) decision rule based on Y only. 


The above theorem is more powerful than it seems. To demonstrate its strength, 
we next use it to show that in testing for a signal in Gaussian noise—irrespective of 
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the prior—the optimal probability of error is monotonically nondecreasing in the 
noise variance. The setup we consider is one where H is of prior (79,771) and aiding 
us in guessing H is the observable Y, which, conditional on H = m, is NV (Om, a”) 
for m € {0,1}. We shall argue that, irrespective of the prior (79,71), the optimal 


probability of error is monotonically nondecreasing in o?. 


The beauty of the argument is that it allows us to prove the monotonicity result 
without having to calculate the optimal probability of error explicitly (as we did 
in Section 20.10 for the case of a uniform prior with ag = A and a, = —A). While 
we could also compute the optimal probability of error for this more general setup 
and then use calculus to derive the monotonicity result, the argument we present 
instead has the advantage of also being applicable to multi-dimensional multi- 
hypothesis testing scenarios, where there is typically no closed-form expression for 
the optimal probability of error. 


To prove this result, let p*(o7) denote the optimal probability of error as a function 
of 07. We need to show that p*(o?) < p*(o* + 67), for all 6 € R. Consider the 
low-noise case where the conditional law of Y given H is N (am, a”), Suppose that 
the receiver generates W ~ NV (0, 6”) independently of (H,Y) and adds W to Y 
to form Z = Y + W. Since Z is the result of processing Y with respect to H, it 
follows that the optimal probability of error based on Y, namely p*(a7), is at least 
as good as the optimal probability of error based on Z (Theorem 20.11.5). We 
now complete the argument by showing that the optimal probability of error based 
on Z is p*(o? + 67). This follows because, by Proposition 19.7.2, the conditional 
law of Z given H is N(am,0? + 67). 

Stated differently, since using a local random number generator the receiver can 
produce from an observation Y of conditional law N (Qing a”) a random variable Z 
whose conditional law is V (nis o7 + 6), the minimal probability of error based 
on an observation having conditional law WV (ras a”) cannot be larger than the 
optimal probability of error achievable based on an observation having conditional 
law NV (Qm, ot + 6°). See Figure 20.4 for an illustration of this argument. 
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This section affords a first glance at the notion of sufficient statistics, which will be 
studied in greater depth and generality in Chapter 22. We begin with the following 
example. Consider the hypothesis testing problem with a uniform prior, where the 
observation is a tuple of real numbers (Yi, Y2). Conditional on H = 0, the random 
variables Y,,Y2 are IID N (0, 8), whereas conditional on H = 1 they are IID 
N (0,07), where 

09 > 01 > 0. (20.65) 


(If 0% = o7, then the problem is boring in that the conditional law of the observable 
given H = 0 is the same as given H = 1, so the two hypotheses cannot be differ- 
entiated. For o2 4 o7 there is no loss in generality in assuming 09 > 01 because 
we can always relabel the hypotheses. And if a9 > a, = 0, then the problem is 
trivial: we guess “H = 1” only if Y; = Yo = 0.) Thus, the observation space is the 
two-dimensional Euclidean space R? and, using the explicit form of the Gaussian 
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fine as A MAP for testing 
ae N (a0, 07 +67) vs. N(a1,07 + 67) 


with prior (70,71) 


Gaussian RV 
Generator 


Local 


jeg ipaiad W independent of (Y, H). 


Figure 20.4: A suboptimal guessing rule (with randomization) for testing 
N (a0, 07) vs. N (a4, 07) with the given prior (79,71). It attains the optimal 
probability of error for guessing N’(ao, 0? + 6”) vs. N(a1,0? + 67) (with the given 
prior). 


density (19.6), 


1 Dat a : 
fy,,¥2|H=0(Y1; y2) ce Qnoe exp ( 202 (yj rn v3), Y1,4Y2 © R, (20.66a) 
1 hy tee : 
fy, YojH=1(Y1, Y2) = ange exp ( Io (yy + v3), yi,y2 ER. (20.66b) 


Since we assumed a uniform prior, the ML decoding rule for guessing H based on 
the tuple (Y1, Y2) is optimal. To derive the ML rule explicitly, we compute the 
likelihood-ratio function 


fy, .¥2|H=0(Y15 Y2 


) 
LR(y1, yo) = 
(us ye) fy, v2) H=1(Y1) Y2) 
= aa exp (-2W se v3) 
ae exp (-2r(0? “i v3) 
2 
OF 1ysl 1 2 7 
= oe gee (2 a) ui +yo)}, yiy2ER. (20.67) 


Thus, 


| o2 
LR(y1,y2) > 1 exp =( 5 z)(ut +92) > 
on) OT 


i! \( 2 + a) > l or 
= eee n-2 
2 oe TG Wi Ye o? 
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2 2 
euty> aan, (20.68) 


where the second equivalence follows from the monotonicity of the logarithm func- 
tion (20.43); and where the last equivalence follows by multiplying both sides of 
the inequality by the constant 20Z07/(02 — 07) (without the need to change the 
inequality direction because this constant is by (20.65) positive). 


It follows from (20.68) that the ML decision rule for guessing H based on (Yj, Y2) 
computes Y? + Y? and then compares the result to a threshold. It is interesting to 
note that to implement this decision rule one need not observe Y; and Y2 directly; 
it suffices to observe the sum of their squares 


Pavey, (20.69) 


Of course, being the result of processing (Y1, Y2) with respect to H, no guess of H 
based on T can outperform an optimal guess based on (Yi, Y2) (Section 20.11). 
But what is interesting about this example is that, even though one cannot recover 
(Yi, Y2) from T (so there are some decision rules based on (Y1, Y2) that cannot 
be implemented if one only knows T), the ML rule based on (Y1, Y2) only requires 
knowledge of T. Thus, in this example, even though pre-processing the observations 
to produce T = Y? + Y# is not reversible, basing one’s decision on T incurs no loss 
in optimality. An optimal decision rule based on T is just as good as an optimal 
rule based on (Yj, Y2). 


The reason for this can be traced to the fact that, in this example, to compute the 
likelihood-ratio LR(y1, y2) one need not know the pair (y1, y2); it suffices that one 
know the sum of their squares y? + y3; see (20.67). In this sense T = Y? + Y# 
forms a sufficient statistic for guessing H from (Yj, Y2), as we next define. 


We would like to define a mapping T(-) from the observation space R¢ to R® as 
being sufficient for the densities fy)7=0(-) and fyjq=1(-) if the likelihood-ratio 
LR(yops) can be computed from T(yops) for every Yovs in R?. However, for techni- 
cal reasons, we require slightly less: we only require that LR(yops) be computable 
from T(yobs) for those observations yops for which at least one of the densities is 
positive (so the likelihood-ratio is not of the form 0/0) and that additionally lie 
outside some prespecified set Vo C R? of Lebesgue measure zero.” Thus, we shall 
require that there exist a set Yo C R% of Lebesgue measure zero and a function 
¢: R® = (0, 00] such that ¢(T'(Yobs)) is equal to LR(yops) whenever 


Yoos ¢ Yo and  fyjH=0(Yoos) + fy|H=1(Yobs) > 0. (20.70) 


Note that the fact that Vo is of Lebesgue measure zero implies that 


Pr[Y € \o|H =0] = PrfY €W|H =1] =0. (20.71) 


5We allow this exception set so that the question of whether T(-) forms a sufficient statistic 
or not will not depend on our choice of the density function of the conditional distribution of the 
observable. (Recall that if a RV has a probability density function, then it has infinitely many 
different probability density functions, every two of which differ on a set of Lebesgue measure 
Zero.) 
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To convince the reader that this really is only “slightly” less, we note: 


Note 20.12.1. Both conditional on H = 0 and conditional on H = 1, the proba- 
bility that the observable violates (20.70) is zero. 


Proof. We shall show that conditional on H = 0, the probability that the ob- 
servable violates (20.70) is zero. The conditional probability given H = 1 can be 
analogously shown to be zero. The condition that (20.70) is violated is equivalent 
to the condition that either yops € Yo or fy|H=o(Yobs) + fy|H=1(Yobs) = 0. By 
(20.71), Pr[Y € ¥o| H = 0] = 0. And, by the nonnegativity of the densities, 


Pr[fyj=0(¥) + fyjw=1(¥) =0|H =0] < Pr[fyjno(¥) =0|H = 0] 


= | fyjH=0o(y) dy 
{yER?¢: fy|H=0(¥)=0} 


I 


| Ody 
{VER*: fy | H=0(¥)=0} 


=0. 
Conditionally on H = 0, the probability of the observable violating (20.70) is thus 


the probability of the union of two events, each of which is of zero probability, and 
is thus of zero probability; see Corollary 21.5.2 ahead. 


Definition 20.12.2 (Sufficient Statistic for Two Densities). We say that a map- 
ping T: R’ > R® forms a sufficient statistic for the density functions fy) 1=0(-) 
and fy\H=1(-) on R®¢ if it is Borel measurable® and if there exists a set Vo C R¢ of 


Lebesgue measure zero and a Borel measurable function ¢: R* (0, co] such that 
for all yous € R® satisfying (20.70) 


fy|H=0 (Yobs) 


fy¥|H=1(Yobs) _ CEN sis): (20.72) 


where on the LHS of (20.72) we define a/0 to be +co whenever a > 0. 


In our example the observation (Yi, Y2) takes value in R? so d = 2; the mapping 
T: (y1,y2) > y? + y2 is a mapping from R? to R so d’ = 1; and by, (20.67), 


o2 1lysl 1 
tt i ( t : 
eur en (5 oF oe ) 


®The technical condition that T(-) is Borel measurable guarantees that T(Y) is a random 
vector. See for example (Billingsley, 1995, Theorem 13.1(ii)) for a discussion of this technical 
issue. The issue is best seen in the scalar case. Suppose that Y is a RV defined over the 
probability space (Q,F,P). If T(-) is any function, then T(Y) is a mapping from 1° to the R, but 
we are not guaranteed that it be a RV, because for T(Y) to be a RV we must have that, for every 
€ ER, the set {w € 2: T(Y(w)) < €} be in F, and this is, in general, not true. However, if T(-) 
is Borel measurable, then the above cited theorem guarantees that T'(X) is, indeed, a RV. Note 
that any continuous function is Borel measurable (Billingsley, 1995, Theorem 13.2). In practice, 
one never encounters functions that are not Borel measurable; In fact, it is hard work to construct 
one. 
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Here we can take Vp to be the empty set.” 


We next show that if T(-) is a sufficient statistic, then there is no loss in opti- 
mality in considering decision rules that base their decision on T(Y). This result 
is almost obvious, because the MAP decision rule is optimal (Theorem 20.7.1); 
because it can be expressed in terms of the likelihood-ratio function (20.40); and 
because the sufficiency of T(-) implies that the likelihood-ratio function LR(yobs) 
is computable from T(yops). Nevertheless, we provide a formal proof because the 
result is important. 


Proposition 20.12.3. Jf T: R¢ > R” is a sufficient statistic for the densities 
fy|H=o(-) and fy|n=1(-), then, irrespective of the prior of H, there exists a decision 
rule that guesses H based on T(Y) and which is as good as any optimal guessing 
rule based on Y. 


Proof. We need to show that if %,,.,.(-) is an optimal decision rule for guessing H 
based on Y, then there exists a guessing rule based on T(Y) that has the same 
probability of error. We note that it is enough to prove this result for a nondegen- 
erate prior (20.2), because for degenerate priors one can achieve zero probability 
of error even without looking at T(Y): if Pr[H = 0] = 1 guess “H = 0,” and if 
Pr[H = 1] = 1 guess “H = 1.” We thus proceed to assume a nondegenerate prior 
(20.2). 


Let émap(-) be the MAP rule for guessing H based on Y. Since this rule is optimal, 
it suffices to exhibit a decoding rule ¢7(-) based on T(Y) of equal performance. 
Since T(-) is sufficient, it follows that there exists a set of Lebesgue measure zero Vo 
and a Borel measurable function ¢(-) such that ¢ (T(Yobs)) = LR(yobs), whenever 
(20.70) holds. Based upon the observation T(Y) = T(yops), the desired rule is to 
guess 


0 if ¢(T(yops)) > B, 
ér(T(¥ovs)) = 4 1 if C(T(yous)) < %, (20.73) 
U({0, 1}) if ¢(T'(Yobs)) = a 


That dr(-) has the same performance as dyap(-) now follows by noting that, 
by (20.72), the two decoding rules are in agreement except perhaps for observa- 
tions Yopg Violating (20.70), but those, by Note 20.12.1, occur with probability zero. 
The performance of dyap(-) (which is optimal based on Y) and of ¢r(-) (which is 
based on T(Y)) are thus identical. 


Definition 20.12.2 is intuitive in that it demonstrates how one typically goes about 
identifying a sufficient statistic: one computes the likelihood-ratio and checks what 
it depends on. This definition, however, becomes a bit cumbersome in multi- 
hypothesis testing, which we shall discuss in Chapter 21. A definition that is more 
appropriate for that setting is given in Chapter 22 in terms of the computability 
of the a posteriori probabilities from T(yops) (Definition 22.2.1). The purpose of 
the next proposition is to show that the two definitions coincide in the binary case: 
ignoring sets of Lebesgue measure zero, the likelihood-ratio can be computed from 


“We would have needed to choose a nontrivial set Yo if we had changed the densities (20.66) 
at a finite number of points. 
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T(Yobs) (whenever the ratio is not 0/0), if, and only if, for any prior (79,771) one can 
compute the a posteriori distribution of H from T(yops) (whenever fy (yobs) > 0). 


We draw the reader’s attention to the following subtle issue. Definition 20.12.2 
makes it clear that the sufficiency of T(-) has nothing to do with the prior; it only 
depends on the densities fy)#=0(-) and fyj#=1(-). The equivalent definition of 
sufficient statistics in terms of the computability of the a posteriori distribution 
ostensibly depends also on the prior, because it is only meaningful to discuss the a 
posteriori distribution if H has a prior. Nevertheless, the definitions are equivalent 
because in the latter definition we require that the a posteriori distribution be 
computable from T(Y) for every prior, and not just for the prior given in the 
problem’s formulation. 


Proposition 20.12.4 (Computability of the a Posteriori Distribution). Let the 
mapping T: R’ — R® be Borel measurable, and let fy;y=o(-) and fy\H=1(-) be 
densities on R¢. Then the following two conditions are equivalent: 


(a) T(-) forms a sufficient statistic for the densities fy|H=o0(-) and fy|H=1(-)- 


(b) For some set Vo C R@ of Lebesgue measure zero we have that for every prior 
(m0, 71) there exist Borel measurable functions from R@ to [0,1] 


t > dm(m0, 71, t), m=0,1, 


such that the vector 


az 
(vo (0,71, T'(Yoos)), Yi (70,71, T(¥ous)) } 


is a probability vector, and this probability vector is equal to the vector 
T 
(Pr =0/Y¥ =yovs], Pr[# =1/Y = Yoos]) (20.74) 
whenever both the condition yoos € Yo, and the condition 


To fy|H=0(Yobs) + 71 fy|H=1(Yobs) > 0 (20.75) 


are satisfied. Here (20.74) is computed for H having the prior (mo,71) and 
for the conditional densities fy|y=o(-) and fyjH=1(-)- 


Proof. We begin by proving that (a) implies (b). That is, we assume that T(-) 
forms a sufficient statistic and proceed to prove the existence of the set Yo and 
of the functions wWo(-), W1(-). Let Yo and ¢: RY | (0, co] be as guaranteed by the 
definition of sufficient statistics (Definition 20.12.2) so 


fy¥|H=0(Yobs) 


fy|H=1(Yobs) = ¢(T(¥obs)), (20.76) 


whenever Yops satisfies (20.70). We next show how to construct for every pair 
(70,71) the functions o9(-),#1(-).. We consider three cases separately: the case 
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7 = 1—7, = 1, the case 7p = 1 — 7, = 0, and the case where both mo and 7 are 
strictly positive. 

In the first case H is deterministically zero, and the functions wW(1,0,t) = 1 and 
w1(1,0,t) = 0 meet our requirements. In the second case H is deterministically 
one, and the functions wW(0,1,t) = 1 — ¥1(0,1,t) = 0 meet our requirements. 


It remains to treat the case where 7,7 > 0. We shall show that in this case the 
functions 


A TG (t) 


mC 1(m0, 71, t) = 1 — Yo(z0, m1, t), (20.77) 


Wo(m0, 71, t) 


(where oo/(co + a) is defined as one for all finite a) meet our requirements. To 
that end we first note that Wo(70, 71, t) and ~1(79,71,t) are nonnegative and sum 
to one. We next note that, for 70,71 > 0, the condition (20.75) implies that 
fy|H#=0(Yobs) and fy|H=1(Yobs) are not both zero. Consequently, if yops satisfies 
(20.75) and also yous ¢ Yo, then it satisfies (20.70) and LR(yops) = ¢(T'(Yobs)). 
Thus, in the case 79,71 > 0, we have that, whenever (20.75) and yobs ¢ Vo hold, 


06 (T (Yobs)) 
706 (T(Yobs)) +7 

m0 LR(Yobs) 
™ LR(Yobs) + 771 

To fy |H=0(Yobs)/ fy|H=1(Yobs) 
To fy|H=0(Yovs)/fy|H#=1(Yovs) + 771 

To f-¥|H=0(Yobs) 

To fy|H=0(Yobs) + 71 f¥|H=1(Yobs) 
Pr[H = 0|Y¥ = yons] 


Wo(m0, 71, T(Yovs)) = 


as required. This implies that, whenever (20.75) and yops ¢ Yo hold, we also have 
W1 (70,71, (Yoos)) = PrLH = 1] ¥ = yobs], since d1(m0, 71, t) = 1 — vo(70, 71, t) 
and since Pr[H = 1] Y = yops] = 1 — Pr[H = 0|Y = yops]; see (20.10). 


We now prove that (b) implies (a), i-e., that the existence of the set Yo and of 
the functions wo(-), Y1(-) imply the existence of the function ¢(-). In fact, we shall 
prove a stronger statement that if for some nondegenerate prior the a posteriori 
distribution of H given Y = yops is computable from T(yops) (whenever (20.75) 
and yops ¢ M hold), then there exists some function ¢: R? — [0,00] such that 
LR(Yobs) = 6(T'(Yops)), Whenever yops satisfies (20.70). 


To construct ¢(-) from wo(-) and 71(-), pick some arbitrary strictly positive 79, 7141 
summing to one (e.g., 7,71 = 1/2), and define ¢(-) by 
71 Wo (70, 71, T(Yovs)) 
How (70, 71,1 (Yovs)) | 


using the convention that a/0 = oo for all a > 0; see (20.39). 


We next verify that if yops satisfies (20.70) then ¢(T'(yops)) = LR(yops). To 
1 


L 
this end, define H to have the law Pr[H = 0] = 7o and Pr[H = =m, 


¢(T(yops)) — (20.78) 
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and let the conditional law of Y given H be as specified by the given densities. 
Since 7p and 7, are strictly positive, it follows that whenever fy|q=0(Yops) and 
fy|H=1(Yobs) are not both zero, we also have 79 fy|7=0(Yobs) +71 fy|H=1(Yobs) > 0. 
Consequently, for strictly positive 79, 7, we have that (20.70) implies that yops ¢ Vo 
and 70 fy|H#=0(Yobs) +71 fy|H=1(Yovs) > 0 and thus, for observations yops satisfying 
(20.70), 


_ ititho (70, 1, T(Yoos)) 
¢(Pvo0s)) = Rows (70, 71, T(Yors)) 
_ Pr[H = 1] Pr[H = 0/Y = yobs] 
a Pr[H = 0] Pr[H = 1]Y = yobs] 
— LR(yobs); 


where the last equality follows by dividing the equation 


Pr[H = 0] fy|H=0(Yobs) 
Pr[H = 0] fyjH=0(Yoos) + Pr[H = 1] fy|n=1(Yops) 


Pr[H = 0|Y = yops| 


(which is a restatement of (20.9a) for our case) by 


Pr[H = 1] fyjz=1(Yobs) 
Pr[H = 0] fy|#=0(Yobs) + Pr[H = 1) fy|H#=1(Yobs) 


Pr[H = 1] Y = yobs] 


(which is a restatement of (20.9b) for our case). 


Once we have identified a sufficient statistic T(Y), we can proceed to derive an 
optimal guessing rule using two methods that we describe next. Again, we focus 
on nondegenerate priors. 


Method 1: We ignore the fact that T(Y) forms a sufficient statistic and simply 
use the MAP rule (20.40): 


0 if LR(yops) > =, 
OMAP (Yobs) = 1 if LR(yobs) ae (20.79) 
U({0, 1}) if LR(Yobs) = To" 


(Because T(Y) is a sufficient statistic, the likelihood-ratio function LR(yops) will 
be computable from Ty ops) whenever LR(yops) does not have the pathological 
form 0/0 and does not lie in the exception set Yo. Such pathological observations 
occur with probability zero (20.12), so we need not worry about them.) 


Method 2: By Proposition 20.12.3, there is no loss in optimality in forming our 
guess based on T(Y). So we can use any optimal rule, e.g., the MAP rule, for 
guessing H based on the new d’-dimensional observations tops = T(yors). This 
method requires computing the conditional distribution of the random d’-vector 


20.13 Consequences of Optimality 389 


T =T(Y) conditional on H = 0 and conditional on H = 1 and deciding according 
to the rule: 


0 if mo frjn—o(T(yobs)) > 71 frjz=1(T(yops)), 


1 if to fryx—o(T(Yoos)) < ™ frjn=1(T(Yobs)), ey) 


Guess (T'(Yobs)) — 


with ties being resolved at random. 


Why would we want to use Method 2 when we have already computed the likelihood- 
ratio function to establish the sufficiency of the statistic? The answer is that some- 
times one can demonstrate that T(Y) forms a sufficient statistic by methods that 
are not based on the computation of the likelihood-ratio. In such cases, Method 2 
may be advantageous. Also, sometimes the analysis of the probability of error in 
Method 2 is easier. The choice is ours. 


Returning to the example of (20.66), we demonstrate Method 2 by calculating 
the law of the sufficient statistic T = Y;? + Y? under each of the hypotheses. 
Recalling that the sum of the squares of two IID zero-mean Gaussians is exponential 
(Note 19.8.1) we obtain: 


1 t 
aoc (-s3): t>0, 20.81 
frjH=o(t) Qo? exp Qa? 2 (20.81a) 


1 t 
= — ——— > 0. 4 
frjn=i(t) 3a exp ( a5) t>0 (20.81b) 


Consequently, the likelihood-ratio is given by 


uatt : 1 i 
aot) =F ex (t(=27 - 2), t>0, 
friz=i(t) 9% 20%) “295 


and the log likelihood-ratio by 


frjn=o(t) ot 1 1 
n =Ins=+ ( ) ’ > 0. 
frin=i(t) 0 doe. 26% 


We thus guess “H = 0” if the log likelihood-ratio is positive, 


= 20¢o7 Or 
t2 o2—02 2’ 
0 1 
ice., if 

22 2 
forked fo 
2 2 0971 0 
yi + y3 2 5) 5 1 a} 
Oy OY 


We similarly guess “H = 1” if the log likelihood-ratio is negative, and flip a coin if 
it is zero. This is the same law we obtained in (20.68) based on Method 1. 


20.13 Consequences of Optimality 


Consider the problem of guessing an a priori uniformly distributed binary ran- 
dom variable H based on the observable Y whose conditional law given H = 0 
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is N' (0,07) and whose conditional distribution given H = 1 is N (1,07). To de- 
rive an optimal guessing rule we could derive the MAP rule by computing the 
likelihood-ratio function as we did in Section 20.10. But having already carried 
out the calculations in Section 20.10 for testing whether an observation was drawn 
N(A,o?) or N(—A,o?), there is a better way. Let 


T=Y- . (20.82) 
Because there is a one-to-one relationship between Y and T,, there is no loss in 
optimality in subtracting 1/2 from Y to obtain T and in then applying an optimal 
decision rule to T. Indeed, since Y = T + 1/2, it follows that Y is the result of 
processing T’ with respect to H, so no decision rule based on Y can outperform an 
optimal decision rule based on T (Theorem 20.11.5). (Of course, no decision rule 
based on T can outperform an optimal one based on Y, because T is the result of 
processing Y with respect to H.) In fact, using the terminology of Section 20.12, 
T: y+ y-—1/2 forms a sufficient statistic for guessing H based on Y, because the 
likelihood-ratio function LR(yops) = fy|H=0(Yobs)/fy|H=1(Yobs) can be expressed 
as C(T(Yovs)) for the mapping ¢: t + LR(t + 1/2). Consequently, our assertion 
that there is no loss in optimality in forming our guess based on T(Y) is just a 
consequence of Proposition 20.12.3. 
Conditional on H = 0, the random variable T(Y) is V (—0.5, a), and, conditional 
on H = 1, itis N(+0.5, ary: Consequently, using the results of Section 20.10 (with 
the substitution of 1/2 for A), we obtain that an optimal rule based on T is to guess 
“H = 0” if T is negative, and to guess “H = 1” if T is positive. To summarize, the 
decision rule we derived is to guess “H = 0” if Y — 1/2 <0 and to guess “H = 1” 
ifY —1/2>0. 
In the terminology of Section 20.12, we used the fact that the transformation in 
(20.82) is one-to-one to conclude that T(-) forms a sufficient statistic, and we then 
used Method 2 from that section to derive an optimal decision rule. 


20.14 Multi-Dimensional Binary Gaussian Hypothesis Testing 


We now come closer to the receiver front end. The kind of problem we would 
eventually like to address is the hypothesis testing problem in which, conditional 
on H = 0, the observable is a continuous-time waveform of the form so(t) + N(#) 
whereas, conditional on H = 1, it is of the form s;(t) + N(t), where (N(t), t € R) 
is some continuous-time stochastic process modeling the noise. This problem will 
be addressed in Chapter 26. For now we only address the discrete time version of 
this problem. 


20.14.1 The Setup 


We consider the problem of guessing the random variable H that takes on the 
values 0 and 1 with positive probabilities 79 and 7. The observable Y € R) is 
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a random vector with J components Y“),...,Y).8 Conditional on H = 0, the 
components of Y are independent Gaussians with YY) ~ NV (s¥ MS a7), where sp is 


some deterministic vector of ] components si), wae 3), and where o? > 0. Con- 


ditional on H = 1, the components of Y are independent with Yo) ~ N (sV ) came 
for some other deterministic vector s; of J] components sf), wekd sf ) We assume 


that So and s, differ in at least one coordinate. The setup can be described as 


H=0:Y9=s947Z0, j=1,2,...,], 


H=1:Y9as%470, j=1,2,...,], 
where ZZ)... Z are IID N (0,07). 


For typographical reasons, instead of denoting the observed vector by yops, we now 
denote it by y and its J components by y“™,...,y9. 


20.14.2 An Optimal Decision Rule 


To find an optimal guessing rule we compute the likelihood-ratio function: 


= fy|H=o(y) 
7. fy|H=1(Y) 


(3)_,(9))? 
es (ae exp ( (y ase ) )) 
= (7) — 5) 2 
Tj (sae exp (4) 
J (yO — G))? (y — G))? 
y 8 y Ss 
II (or a + a )): yeR. 


The log likelihood-ratio function is thus given by 
LLR(y) = nLR(y) 


1g (i) _ 2 (i) _ 2 
= 59 D(y — of) — y - 7”) 


1 ey oy oy) + Heal = lol? 
o~. o2 y> 0 1 E 9 


1 = a 
( (v.50 ~ 51), — SoS — Se nt ts) 


LR(y) 


2 


So—S1 So—-S1 
_ |ls0 — sill ( So — S81 ) (s0, lso—s:|| a 7 (si, [so—si] he 
E 


” Ilso — s1| 2 


_ ||So = sil 


SS (iv. b)e— 5((60:b)e + (61-4)n)), yeR', (2088) 


8We use J rather than d in order to comply with the notation of Section 21.6 ahead. 
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To < 771 To = M7 To > 771 


Figure 20.5: Effect of the ratio 79/7, on the decision rule. 


where for real vectors u = (u),...,u)? and v = (v,...,v)7 taking value 
in RJ we define® 


(20.84) 
(20.85) 
and where 
So—Ss 
¢=— (20.86) 
Ilso — s:|| 
is a unit-norm vector pointing from s, to So. 
An optimal decision rule is to guess “H = 0” when LLR(y) > In at, Le., 
2 
Guess “H = 0” if (y, b) 5 > (So, >) p ee (Si, d) 5 o In = (20.87) 
2 IISo — sil] 70 


and to guess “H = 1” otherwise. This decision rule is illustrated in Figure 20.5. 
Depicted are the cases where 71/7 is smaller than one, equal to one, and larger 
than one. 


It is interesting to note that the projection (y,@),@ of y onto the normalized 
vector @ = (Sg —81)/ |/So — Si|| forms a sufficient statistic for this problem. Indeed, 
by (20.83), the log likelihood-ratio (and hence the likelihood-ratio) function is 
computable from (y, @),;. The projection is depicted in Figure 20.6. 


The rule (20.87) simplifies if H has a uniform prior. In this case the rule is 


(So, @)g + (81, 9)B 


’ (20.88) 


Guess “H =0" if (y,), > 


Note that in this case the guessing rule can be implemented even if o? is unknown. 


°This is sometimes called the standard inner product on RJ or the inner product between 
J-tuples. The subscript “E” stands here for “Euclidean.” 
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Figure 20.6: The projection of y onto the normalized vector @ = (So—s1)/||So—si||- 


20.14.3. Error Probability Analysis 


We next find the error probability associated with our guessing rule. We de- 
note the conditional probabilities of error associated with our guessing rule by 
pmap(error|H = 0) and pmap(error|H = 1). Since our rule is optimal, its uncon- 
ditional probability of error is p*(error) and is given by 


p’ (error) = 7 pmap(error|H = 0) +7 pmap(error|H = 1). (20.89) 


Because in (20.87) we resolved ties by guessing “H = 0”, it follows that to evaluate 
pmap(error|H = 0) we need to evaluate the probability that a random vector Y 
drawn according to the density fy|=0(-) is such that the a posteriori probability 
of H = 0 is strictly smaller than the a posteriori probability of H = 1. Thus, if 
ties in the a posteriori distribution of H are resolved in favor of guessing “H = 0”, 
then 


pmap(error|H = 0) = Pr[mo fyjz=o(¥) < mfyjwai(Y¥) | H = 0]. (20.90) 


This may seem self-referential, but it is not. Another way to state this is 


puar(eror| =0) =f frinaoly) dy, (20.91) 
y¢Bi,o0 
where 
Bio = {y eR): Tofy|H=0(y) = m finaly) }. (20.92) 


To compute this probability we need the following lemma: 


Lemma 20.14.1. Let 1 and 7, be strictly positive but not necessarily sum to one. 
Let the vectors 89,81 € R) differ in at least one component, i.e., ||S9 —s1|| > 0. Let 


1. /\? Thee 
fo(y) = (=) exp (-s3 Do - )"), y ER’, 
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f. A ea 
= - (3) _ Wy? J 
hiv) =( —) exp ( a ee) i: y eR’, 
where a? > 0. Define 
Big = {y ER): mofo(y) = mfrly)}. 


Then 


so. S8i) 4 2 “2, (20.93) 


[orev = of Tae eae 


This equality continues to hold if we replace the weak inequality (>) in the definition 
of By,.9 with a strict inequality (>). 


Proof. Using a calculation identical to the one leading to (20.83) we obtain that 
the set By9 can also be expressed as 


(So, ?)p + (s1, ?) j o A TY 
2 |So — sil TO 


Bio = {y ER): ye > \ (20.94) 


where @ is defined in (20.86). 


The density fo(-) is the same as the density of the vector so + Z, where the com- 
ponents ZY,...,Z0 of Z are IID N (0,07). Thus, the LHS of (20.93) can be 
expressed as 


— (s1,?)p af o =] 


n 
Iso — sil] 70 


(si, 0) p = (So, ?) p a? | 


T n 
Iso — sil] 70 


(So, $)p — (s1, 0) p o? “| 


+ In 
2 So — S1|| Ty 


2 |So — si|| TY 


a (on in), 


20 lSo — s1|| Ty 


where the first equality follows from (20.94) and from the observation that the 
density fo(-) is the density of so + Z; the second because (-,-), in linear in the first 
argument, so (so + Z, @)p = (So, &)p+(Z, &)p; the third by noting that multiplying 
both sides of an inequality by (—1) requires changing the direction of the inequality; 
the fourth by the linear relationship (s,,%),_— (So, @)p = (S1 — So, &) p; the fifth by 
(20.86); and the final equality because, as we next argue, (Z,—@) » ~ N (0, o); so 
we can employ (19.12a). To see that (Z,—), ~N (0,07), note that, by (20.86), 
|| — @|| = 1 and then employ Proposition 19.7.3. 


20.14 Multi-Dimensional Binary Gaussian Hypothesis Testing 395 


This establishes the first part of the lemma. The result where the weak inequality 
is replaced with a strict inequality follows by replacing all the weak inequalities 
in the proof with the corresponding strict inequalities and vice versa. (If X has a 
density, then Pr[X < €] = Pr[X < ].) 


By applying Lemma 20.14.1 to our problem we obtain 


pmap(error|H = 0) = o( #2 ae |e cy 72), (20.95) 
20 ||So = s1|| TT 
Similarly, one can show that 
I|So — sill g T 
¢ H=1)= } . 20. 
pmap(error| ) o( oe fse— eal n = (20.96) 
Consequently, by (20.89) 
p* (enor) = m o( =A (on li =) 
20 |So— sil] m7 
aa o( Meal Basie n=). (20.97) 
20 ||So = s1|| TO 


In the special case where the prior is uniform we obtain from (20.95), (20.96), and 
(20.97) 


p* (error) = pmap(error|H = 0) = pmap(error|H = 1) = o( Hal). (20.98) 


This has a nice geometric interpretation. It is the probability that a NV (0, a”) RV 
exceeds half the distance between the vectors sp and s;. Stated differently, since 
||So — si|| /o is the number of standard deviations that separate sp and s;, we can 
express the probability of error as the probability that a standard Gaussian exceeds 
half the distance between the vectors as measured in standard deviations of the 
noise. 


20.14.4 The Bhattacharyya Bound 


Finally, we compute the Bhattacharyya Bound for this problem. From (20.50) we 
obtain that, irrespective of the values of 70,71, 


p* (error) 


1 
< 3 a V fn=0l¥) fyin=a(¥) dy 


1 (9-89) J 1 (y)—99)? 
lise") TI e732 | dy 
: V 2102 V2r0? 


J (7) _4(9))? ()_ 69)? 
1 1 (Y=?) (y-s}??) 
= a | | ( e se? eC se? dy 
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F i\\2 5 i)\\2 
ad [Ul Me CU aCe alen wee 
2 Jy; V2r0? 4o2 
i 1 2(y9))? ay (669) +6) + (6)? + (2)? 
y 


e do? dy) 


lve [2 1 y? y (sy + st) + $((s0?)” a5 (er) 
=5 I) exp ( dy 


jel V2n0? 20? 
: : = (u ee Cues 
7 sf. V2n0? oe 20? wy 
ia (s? - 3) 
ole) 
J 


Ilso — $||" 
, (20.99) 


where the last integral is evaluated using (19.22). 


20.15 Guessing in the Presence of a Random Parameter 


We now consider the guessing problem when the distribution of the observable Y 
depends not only on the hypothesis H but also on a random parameter O, which 
is independent of H. Based on the conditional densities fyje,47=0(-), fyje,z=1(-), 
the nondegenerate prior 7,7, > 0, and on the law of 0, we seek an optimal rule 
for guessing H. We distinguish between two cases depending on whether we must 
base our guess on the observed value yops of Y alone—random parameter not 
observed—or whether we also observe the value 6,p;, of 0—random parameter 
observed. The analysis of both cases is conceptually straightforward. 


20.15.1 Random Parameter Not Observed 


The guessing problem when the random parameter is not observed is sometimes 
called “testing in the presence of a nuisance parameter.” Conceptually, the situ- 
ation is quite simple. We have only one observation, Y = yops, and an optimal 
decision rule is the MAP rule (Theorem 20.7.1). The MAP rule entails computing 
the likelihood-ratio function 


Rigg ae (20.100) 


fy|H=1(Yobs) 
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and comparing the result to the threshold 71/7; see (20.40). 


Often, however, the densities fy|7=0(Yops) and fy|H7=1(Yobs) appearing in (20.100) 
are not given directly. Instead we are given the density of O and the conditional 
density of Y given (H,O). (We shall encounter such a situation in Chapter 27 
when we discuss noncoherent communications.) In such cases we can compute the 
conditional density fy|H#=0(Yobs) as follows: 


fy¥|H=0(Yobs) = [ froin-olvor) dé 
= f frie=o,n=0(vor) fo|H=0(8) dé 
= f fvie=o,10(vor) fo(8) a8, (20.101) 


where the first equality follows because from the joint density one obtains the 
marginal density by integrating out the variable in which we are not interested; 
the second by the definition of the conditional density; and the final equality from 
our assumption that © and A are independent. (In computations such as these 
it is best to think about the conditioning on H = 0 as defining a new law on 
(Y,©)—a new law to which all the regular probabilistic manipulations, such as 
marginalization and computation of conditional densities, continue to apply. We 
thus simply think of the conditioning on H = 0 as specifying the joint law of (Y, 0) 
that we have in mind.) 


Repeating the calculation under H = 1 we obtain that the likelihood-ratio function 
is given by 


= Jo fv\e=0,1=0(Yovs) fo(A) dé 
Jo fv\o=0,1=1(Yobs) fo() de 


LR(Yops) (20.102) 


The case where 0 is discrete can be similarly addressed. An optimal decision rule 
can now be derived based on this expression for the likelihood-ratio function and 
on the MAP rule (20.40). 


20.15.2 Random Parameter Observed 


When the random parameter is observed to be 0 = 654s, we merely view the 
problem as a standard hypothesis testing problem with the observation consisting 
of Y and 0. That is, we base our decision on the likelihood-ratio function 


— fo) s19o Ss 
LR(yYobs; Jobs) = Fy ein=oly P 2 ) (20.103) 


fy,o|#=1(Yobs; Bobs) 


The additional twist is that because O is independent of H we have 


fy,e|H=0 (Yobs; obs) = fojH=0(9obs) fy}0=6.1.,H=0(Yobs) 
— fo (obs) fy |}e=0.n,,H=0(Yobs); (20.104) 
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where the second equality follows from the independence of 0 and H. Repeating 
for the conditional law of the pair (Y, 9) given H = 1 we have 


fy,o|H=1(Yobs, 9obs) = fo(Pobs) fy |O=054.,H=1(Yobs)- (20.105) 


Consequently, by (20.103), (20.104), and (20.105), we obtain that for Op; satisfying 
fo (Oops) #0 


LR(Yobs; bobs) — FY¥|H=0,0=6.0s (Yobs) 


: 20.106 
IVin=1,.6=65, (Yoos) ( ) 


An optimal decision rule can be again derived based on this expression for the 
likelihood-ratio and on the MAP rule (20.40). 


20.16 Mathematical Notes 


A standard reference on hypothesis testing is (Lehmann and Romano, 2005). It 
also contains a measure-theoretic treatment of the subject. For a precise math- 
ematical definition of the condition X—o—Y—o—Z we refer the reader to (Loéve, 
1963, Section 25.3). For a measure-theoretic treatment of sufficient statistic see 
(Loéve, 1963, Section 24.4), (Billingsley, 1995, Section 34), (Romano and Siegel, 
1986, pp. 154-156), and (Halmos and Savage, 1949). For a measure-theoretic treat- 
ment of the notion of conditional distribution see, for example, (Billingsley, 1995, 
Chapter 6), (Williams, 1991, Chapter 9), or (Lehmann and Romano, 2005, Chap- 
ter 2). 


20.17 Exercises 


Exercise 20.1 (Hypothesis Testing). Let H take on the values 0 and 1 equiprobably. 
Conditional on H = 0, the observable Y is equal to a+ Z, where Z is a Laplace RV, i.e., 
is of density 

fal(z)= sone, zeER, 


and a > 0 is a given constant. Conditional on H = 1, the observable Y is given by —a+ Z. 


(i) Find and draw the densities fy|4=0(-) and fy|H=1(-). 
(ii) Find an optimal rule for guessing H based on Y. 
(iii) Compute the optimal probability of error. 

) 


(iv) Compute the Bhattacharyya Bound. 

Exercise 20.2 (A Discrete Multi-Dimensional Problem). Let H take on the values 0 
and 1 according to the prior (7,71). Let the observation Y = (Yi,...,¥n)" be an n- 
dimensional binary vector. Conditional on H = 0, the components of the vector Y are 
IID with 


Pr|¥e =1| H =0) =1— Prl¥,=0|H =0]=0.25, £=1,..:5n. 


Conditional on H = 1, the components are IID with 


Pr[¥e=1| # = 1) =1—Prl¥,=0|H =1] =0.75, 2=1,...5n. 
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(i) Find an optimal rule for guessing H based on Y. 
(ii) Compute the optimal probability of error. 
(iii) Compute the Bhattacharyya Bound. 


Hint: You may need to treat the cases of n even and n odd separately. 


Exercise 20.3 (A Multi-Antenna Receiver). Let H take on the values 0 and 1 equiprob- 
ably. We wish to guess H based on the random variables Y; and Y2. Conditional on 
H=0, 

Y=A+Z, Yo =A+ Za, 


and conditional on H = 1, 
Y=-A+Z, Y¥2=-A+Z2p. 
Here A is a positive constant, and Z, ~ N(0, ai), Z2~ N(0, 03), and H are independent. 
(i) Find an optimal rule for guessing H based on (Yj, Y2). 
(ii) Draw the decision regions in the (Yi, Y2)-plane for the special case where o1 = 202. 


Find the optimal probability of error in terms of 07, 03, and A. 


) 
) 
(iii) Returning to the general case, find a one-dimensional sufficient statistic. 
(iv) 
) 


(v 


Consider a suboptimal receiver that declares “H = 0” if Y1 + Y2 > 0, and otherwise 
declares “H = 1.” Evaluate the probability of error for this decoder as a function 
of 07, of, and A. 


Exercise 20.4 (Binary Hypothesis Testing with General Costs). Let H take on the values 0 
and 1 according to the prior (70, 71). The observable Y has conditional densities fy|+=0(-) 
and fy|#=1(-). Based on Y, we wish to guess the value of H. Let the guess associated 
with Y = yops be denoted by dGuess(Yobs). Guessing “H = 7” when H = v costs c(n,v), 
where c(-,-) is a given function from {0,1} x {0,1} to the nonnegative reals. Find a 
decision rule that minimizes the expected cost 


E[e(dcuess(Y), H)| = Som Yo e(n,v) Pr[dcuess(Y) =| H =v). 
v=0 n=0 


Exercise 20.5 (Binary Hypothesis Testing). Let H take on the values 0 and 1 according 
to the prior (70,71), and let the observation consist of the RV Y. Conditional on H, the 
densities of Y are given for every y € R by 
2 
= _v 
fy|#=0(y) =e" I{y = O}, fy|H=1(y) = Be ® I{y = Of, 
where 3 > 0 is some constant. 
(i) Determine £. 
(ii) Find a decision rule that minimizes the probability of error. 


(iii) For the rule that you have found, compute Pr(error|H = 0). 


Hint: Different priors can lead to dramatically different decision rules. 
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Exercise 20.6 (Bhattacharyya Bound). 


(i) Show that the Bhattacharyya Bound never exceeds 1/2. 
(ii) When is it equal to 1/2? 


Hint: You may find the Cauchy-Schwarz Inequality useful. 


Exercise 20.7 (The Bhattacharyya Bound for Conditionally IID Observations). Consider 
a binary hypothesis testing problem where, conditional on H = 0, the J components of 
the observed random vector Y are IID with each component of density fo(-). Conditional 
on H = 1 the components of Y are IID with each component of density f,(-). Express 
the Bhattacharyya Bound in terms of J and 


[ Ja) Fi) dy. 


Exercise 20.8 (Error Probability and £ ;-Distance). Consider the setting of Theorem 20.5.2 
when H has a uniform prior. Show that in this case (20.26) can also be written as 


Pr[Ginme(Y) #H] = 5 ~ 7 f lfvur-ol¥) ~ frinaa(v)| dy. 


Exercise 20.9 (Conditionally Poisson Observations). A RV X is said to have a Poisson 
distribution of parameter (“intensity”) A, where » is some nonnegative real number, if X 
takes value in the nonnegative integers and 


n 


Pr[X = n| =e”°* a 


, n=0,1,2,... 
n! 


(i) Find the Moment Generating Function of a Poisson RV of intensity . 


(ii) Show that if X and Y are independent Poisson random variables of intensities \,, 
and A,, then their sum X + Y is Poisson with parameter Az + Ay. 


(iii) Let H take on the values 0 and 1 according to the prior (70,71). We wish to 
guess H based on the RV Y. Conditional on H = 0, the observation Y is Poisson 
of intensity a + A, whereas conditional on H = 1 it is Poisson of intensity (@ + X. 
Here a, G, \ are known non-negative constants. Show that the optimal probability 
of error is monotonically non-decreasing in X. 


Hint: For Part (ti) recall Part (ii) and that no randomized decision rule can outperform 
an optimal deterministic rule. 


Exercise 20.10 (Optical Communication). Consider an optical communication system 
that uses binary on/off keying at a rate of 10° bits per second. At the beginning of each 
time interval of duration 107° seconds a new data bit D enters the transmitter. If D = 0, 
the laser is turned off for the duration of the interval; otherwise, if D = 1, the laser is 
turned on. The receiver counts the number Y of photons received during the interval. 
Assume that, conditional on D, the observation Y is a Poisson RV whose conditional 
PMF is 


eT 
Pry 9 | DS 0) a y =0,1,2,..., (20.107) 
= 
PY =4|D=1)=— o> USO (20.108) 


where A > 4 > 0. Further assume that Pr[D = 0] = Pr[D = 1] = 1/2. 
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(i) Find an optimal guessing rule for guessing D based on Y. 

(ii) Compute the optimal probability of error. (Not necessarily in closed-form.) 

(iii) Suppose that we now transmit each data bit over two time intervals, each of duration 
10-8 seconds. (The system now supports a data rate of 0.5 x 10° bits per second.) 
The receiver produces the photon counts Yi and Y2 over the two intervals. Assume 
that, conditional on D = 0, the counts Y; & Y2 are IID with the PMF (20.107) 
and that, conditional on D = 1, they are IID with the PMF (20.108). Find a 
one-dimensional sufficient statistic for the problem and use it to find an optimal 
decision rule. 


Hint: For Part (iii), recall Part (ii) of Exercise 20.9. 


Exercise 20.11 (Monotone Likelihood Ratio and Log-Concavity). Let H take on the 
values 0 and 1 according to the nondegenerate prior (70,71). Conditional on H = 0, the 
observation Y is given by 

Y= £0 + Z, 
where €  € R is some deterministic number and Z is a RV of PDF fz(-). Conditional on 
H =1, the observation Y is given by 


Y=&4+2Z, 
where £1 > 0. 


(i) Show that if the PDF fz(-) is positive and is such that 


fz(m — 0) fz(yo — 1) < fa(yi — £1) fz (yo — £0), (uv > yo, €1 > fo), (20.109) 


then an optimal decision rule is to guess “H = 0” if Y < y* and to guess “H = 1” 
if Y > y* for some real number y”. 


(ii) Show that if z +> log fz(z) is a concave function, then (20.109) is satisfied. 


Mathematicians state this result by saying that if g: R — R is positive, then the mapping 
(x,y) + g(x — y) has the Total Positivity property of Order 2 if, and only if, g is log- 
concave (Marshall and Olkin, 1979, Chapter 18, Section A, Example A.10). Statisticians 
state this result by saying that a location family generated by a positive PDF f(-) has 
monotone likelihood ratios if, and only if, f(-) is log-concave. For more on distributions 
with monotone likelihood ratios see (Lehmann and Romano, 2005, Chapter 3, Section 
3.4). 

Hint: For Part (ii) recall that a function g: R +} R is concave if for anya < b and 
0<a<1 we have g(aa+ (1—a)b) > ag(a) + (1— a) 9(b). You may like to proceed as 
follows. Show that if g is concave then 


g(a — Az) + g(a + Az) < g(a— Ai) +g(a+A1), [Ai] < |Adl. 


Defining g(z) = log fz(z), show that the logarithm of the LHS of (20.109) can be written 
as 


1 
2 


Ay Ae); 


mm | 


Mr e4 1 1 1 i 7 
a(a f+ sAu4 =e) a(g 

where 7 
9=(yotyi)/2, €=(o+81)/2, Ay=yr—yo, Ag =€i — ho. 

Show that the logarithm of the RHS of (20.109) is given by 


o(g-E+ 5Ay— 54e) +9(g-E+ 5 Ae — 5 An). 
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Exercise 20.12 (Is a Uniform Prior the Worst Prior?). Based on an observation Y, we 
wish to guess the value of a RV H taking on the values 0 and 1 according to the prior 
(m0, 71). Conditional on H = 0, the observation Y is uniform over the interval [0,1], and, 
conditional on H = 1, it is uniform over the interval [0, 1/2]. 


(i) Find an optimal rule for guessing H based on the observation Y. Note that the 
rule may depend on 70. 


(ii) Let p*(error; 79) denote the optimal probability of error. Find p*(error; 70) and 
plot it as a function of 70 in the range 0 < mo < 1. 


(iii) Which value of 7 maximizes p* (error; 7)? 
Consider now the general problem where the RV Y is of conditional densities fy|+=0(-), 


fy|H=1(-), and H is of prior (70,71). Let p*(error; 7) denote the optimal probability of 
error for guessing H based on Y. 


(iv) Prove that 
r 1 Te. co 
Pp (error; 5) > 5P (error; 7) + aP (error;1— 79), 70 € [0,1]. (20.110a) 


(v) Show that if the densities fy|7=0(-) and fy|#=1(-) satisfy 


fy\a=o(y) = fyjn=1(-y), YER, (20.110b) 


then 
p (error; 70) = p'(error;1—70), 70 € [0,1]. (20.110c) 


(vi) Show that if (20.110b) holds, then the uniform prior is the worst prior: 


p (error; 70) < p'(error;1/2), 70 € [0,1]. (20.110d) 


Hint: For Part (iv) you might like to consider a new setup. In the new setup H=MeS, 
where © denotes the exclusive-or operation and where the binary random variables M 
and S are independent with S taking value in {0,1} equiprobably and with Pr[M = 0] = 
1—Pr[M = 1] = mo. Assume that in the new setup (M,S)—o—H—o—Y and that the 
conditional density of Y given H = 0 is fy|H=o(-) and given H =1 it is fy|=1(-)- 
Compare now the performance of an optimal decision rule for guessing H based on Y 
with the performance of an optimal decision rule for guessing H based on the pair (Y, S). 
Express these probabilities of error in terms of the parameters of the original problem. 


Exercise 20.13 (Hypothesis Testing with a Random Parameter). Let Y = X + AZ, 
where X, A, and Z are independent random variables with X taking on the values +1 
equiprobably, A taking on the values 2 and 3 equiprobably, and Z ~ N (0, a). 


(i) Find an optimal rule for guessing X based on the pair (Y, A). 
(ii) Repeat when you observe only Y. 


Exercise 20.14 (Bounding the Conditional Probability of Error). Show that when the 
prior is uniform 


paar(error|H =0) < f y/Feja=o(¥) Fyi—alv) dy. 
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Exercise 20.15 (Upper Bounds on the Conditional Probability of Error). 


(i) Let H take on the values 0 and 1 according to the nondegenerate prior (70,71). Let 
the observation Y have the conditional densities fy;H=o0(-) and fyj#=1(-). Show 
that for every p > 0 


pan(error| =0) < (Z)" f ina ¥) Aeif-olW) dy. 


(ii) A suboptimal decoder guesses “H = 0” if qo(y) > qi(y); guesses “H = 1” if 
go(y) < qi(y); and otherwise tosses a coin. Here qo(-) and qi(-) are arbitrary 
positive functions. Show that for this decoder 


= ay) \? 
p(error|H = 0) < / (22) fyjz=oly)dy, p> 0. 


Hint: In Part (i) show that you can upper-bound I{m fyjnai(y)/(70 fy;naoly)) = 1} by 
(m1 fyjnai(y)/ (7 fyin-o(y)))’- 


Exercise 20.16 (The Hellinger Distance). The Hellinger distance between the densities 
f(-) and g(-) is defined as the square root of 


5 | (Vi@ - Vale) ae 


(though some authors drop the one-half). 
(i) Show that the Hellinger distance between f(-) and h(-) is upper-bounded by the 
sum of the Hellinger distances between f(-) and g(-) and between g(-) and h(-). 
(ii) Relate the Hellinger distance to the Bhattacharyya Bound. 
(iii) Show that the Hellinger distance is upper-bounded by one. 


Exercise 20.17 (Artifacts of Suboptimality). Let H take on the values 0 and 1 equiprob- 
ably. Conditional on H = 0, the observation Y is N(1, a”), and, conditional on H = 1, 
it is N(-1, a) Alice guesses “H = 0” if Y > 2 and guesses “H = 1” otherwise. 


(i) Compute the probability that Alice errs as a function of 0. 
(ii 


(iii 


Show that this probability is not monotonically nondecreasing in 0”. 


) 
) 
) Does her guessing rule minimize the probability of error? 

(iv) Show that if you are obliged to use her rule, then adding noise to Y prior to feeding 
it to her detector may be beneficial. 


Exercise 20.18 (The Bhattacharyya Bound and a Random Parameter). Let © be inde- 
pendent of H and of density fo(-). Express the Bhattacharyya Bound on the probability 
of guessing H incorrectly in terms of fo(-), fyjo=o,a=o(-) and fyjo=o,4=1(-). Treat the 
case where © is not observed and the case where it is observed separately. Show that the 
Bhattacharyya Bound in the former case is always at least as large as in the latter case. 


Chapter 21 


Multi-Hypothesis Testing 


21.1 = Introduction 


In Chapter 20 we discussed how to guess the outcome of a binary random variable. 
We now extend the discussion to random variables that take on more than two—but 
still a finite—number of values. Statisticians call this problem “multi-hypothesis 
testing” to indicate that there may be more than two hypotheses. Rather than 
using H, we now denote the random variable whose outcome we wish to guess 
by M. (In Chapter 20 we used H for “hypothesis;” now we use M for “message.” ) 
We denote the number of possible values that M can take by M and assume that 
M > 2. (The case M = 2 corresponds to binary hypothesis testing.) As before the 
“labels” are not important and there is no loss in generality in assuming that M 
takes value in the set M = {1,...,M}. (In the binary case we used the traditional 
labels 0 and 1 but now we prefer 1,2,...,M.) 


21.2 The Setup 


A random variable M takes value in the set M = {1,...,M}, where M > 2 
according to the prior 


Tm =Pr{M=m], mem, (21.1) 
where 
Im 20, mewM, (21.2) 
and where 
Mio (21.3) 
meM 
We say that the prior is nondegenerate if 
Tm >0, meM, (21.4) 


with the inequalities being strict, so M can take on any value in M with positive 
probability. We say that the prior is uniform if 


1 
ME (21.5) 


MH = TMS 
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The observation is a random vector Y taking value in R?. We assume that for 
each m € M the distribution of Y conditional on M = m has the density! 


fy|m=m(-), meM, (21.6) 


where fy|;=m(-) is a nonnegative Borel measurable function that integrates to 
one over R¢, 


A guessing rule is a Borel measurable function dauess: R4 ~ M from the space 
of possible observations R% to the set of possible messages M. We think about 
cuess(Yobs) aS the guess we form after observing that Y = yop. The error 
probability associated with the guessing rule dGuess(-) is given by 


Pr[dguess(Y) # M]. (21.7) 


Note that two sources of randomness determine whether we err or not: the real- 
ization of M and the generation of Y conditional on that realization. A guessing 
rule is said to be optimal if no other guessing rule achieves a lower probability 
of error.2, The optimal error probability p*(error) is the probability of error 
associated with an optimal decision rule. In this chapter we shall derive optimal 
decision rules and study the optimal probability of error. 


21.3 Optimal Guessing 


Having observed that Y = yops, we would like to guess M. An optimal guessing 
rule can be derived, as in the binary case, by first considering the scenario where 
there are no observables. Its extension to the more interesting case where we 
observe Y is straightforward. 


21.3.1 Guessing in the Absence of Observables 


In this scenario there are only M deterministic decision rules to choose from: the 
decision rule “guess 1”, the decision rule “guess 2”, etc. If we employ the “guess 1” 
rule, then we are correct if M is indeed equal to 1 and thus with probability of 
success 71 and corresponding probability of error of 1—7,. In general, if we employ 
the “guess m” rule for some m € M, then our probability of success is 7,,. Thus, 
of the M different rules at our disposal, the one that has the highest probability 
of success is the “guess m” rule, where m is the outcome that is a priori the most 
likely. If this m is not unique, then guessing any one of the outcomes that have 
the highest a priori probability is optimal. 


lWe feel no remorse for limiting ourselves to conditional distributions possessing a density. 
The reason is that, while the reader is encouraged to assume that the densities are with respect to 
the Lebesgue measure, this assumption is never used in the text. And using the Radon-Nikodym 
Theorem (Billingsley, 1995, Section 32), one can show that even in the most general case there 
exists a measure on R@ with respect to which the conditional laws of Y conditional on each of 
the possible values of M are absolutely continuous. That measure can be taken, for example, as 
the sum of the conditional laws corresponding to each of the possible values that M can take. 

? As in the case of binary hypothesis testing, an optimal guessing rule always exists. 
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We conclude that in the absence of observables, the guessing rule “guess m” is 
optimal if, and only if, 

Tim = max, Tm! > (21.8) 
For an optimal guessing rule the probability of success is 


* (correct) = sal, 21.9 
p* (correct) max, { ayae (21.9) 


and the optimal error probability is thus 


p (error) = 1— max {tm’ }. (21.10) 


21.3.2 The Joint Law of M and Y 


Using the prior {7,,} and the conditional densities {fyjmm(-)}, we can express 
the unconditional density of Y as 


fy(y) = So mm fyimem(y), y ER? (21.11) 
meM 


As in Section 20.4, we define for every m € M and for every yopos € R?@ the 
conditional probability that M = m conditional on Y = yops by 


Tm fyjm=m(Yoos) : 
Pr[M = m|Y = yors] 4 Fe (Yous) if fy (Yous) > 0, (21.12) 
oa otherwise. 
By an argument similar to the one proving (20.12) we have 
Pr[Y € {y ER*: fy(¥) = 0}] =0, (21.13) 


which can also be written as 


Pr Py y y= 0) =o; 


21.3.3. Guessing in the Presence of Observables 


The problem of guessing in the presence of an observable is very similar to the 
one without observables. The intuition is that after observing that Y = yobs, we 
associate with each m € M the a posteriori probability Pr[M = m|Y = yobs] and 
then guess M as though there were no observables. Thus, rather than choosing 
the message that has the highest a priori probability as we do in the absence of 
observables, we should now choose the message that has the highest a posteriori 
probability. 


After having observed that Y = yops we should thus guess “rm” where 7m is the out- 
come in M that has the highest a posteriori probability. If more than one outcome 
attains the highest a posteriori probability, then we say that a tie has occurred 
and we need to resolve this tie by picking one (it does not matter which) of the 
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outcomes that attains the maximum a posteriori probability. We thus guess “m,” 
in analogy to (21.8), only if 


is eee 
Pr[M = m|Y = yobs] max {Pr[M =m |Y= Yobs] }- 


(We shall later define the Maximum A Posteriori guessing rule as a randomized 
decision rule that picks uniformly at random from the outcomes that have the 
highest a posteriori probability; see Definition 21.3.2 ahead.) 


In analogy with (21.9) we have that for this optimal rule 
i. t|Y = obs) = Pri{M = : Y= obs] f > 
p*(correct|¥ = Yobs) = max {Pr[M = m'|¥ = yoos] } 


and in analogy with (21.10), 
p*(error|Y = yobs) = 1 — max {Pr[M =m |Y= Yobs] }. 


Consequently, the unconditional optimal probability of error can be expressed as 


p'(error) = f (1 — max {Pr{M = m'|¥ =yl}) frtw)ay. 


where fy(-) is the unconditional density function of Y and is given in (21.11). 


We next proceed to make the above intuitive discussion more rigorous. We begin 
by defining for every possible observation yops € R¢ the set of outcomes of maximal 
a posteriori probability: 


M(Yops) = {in eM: Pr[M = m|Y = yobs] max, Pr{M=m'|Y= Yoos]}. 
(21.14) 


As we next argue, this set can also be expressed as 


M(Yobs) = {in eM: Tin fyj main (Yobs) —= eo Tm! fy|m=m! (Yous) }. (21.15) 


This can be shown by treating the case fy(yops) > 0 and the case fy(yops) = 0 
separately. In the former case, (21.15) is verified by noting that in this case we 
have, by (21.12), that Pr[M = m'|Y = yobs] = ty’ fy|t=m/(Yobs)/f¥(Yobs), 80 
the result follows because scaling the scores of all the elements of a set by a positive 
number that is common to them all (1/fy(yops)) does not change the subset of 
the elements with the highest score. In the latter case we note that, by (21.12), 
we have for all m’ € M that Pr[M = m'|Y = yobs] = 1/M, so the RHS of (21.14) 
is M and we also have by (21.11) for all m’ € M that am fy|i—m'(Yobs) = 0 so 
the RHS of (21.15) is also M. 


Using the above definition of M(y,ps) we can now state the main theorem regarding 
optimal guessing rules. 


Theorem 21.3.1 (Optimal Multi-Hypothesis Testing). Let M take value in the set 
M = {1,...,M)} with the prior (21.1), and let the observation Y be a random vec- 


tor taking value in R@ with conditional densities fy|mai(),--->fyjm=m(-). Any 
guessing rule déyecs: R? + M that satisfies 
Guess (Yobs) € M(Yobs); Yobs € R?¢ (21.16) 


is optimal. Here M(yops) is the set defined in (21.14) or (21.15). 
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Proof. Every (deterministic) guessing rule induces a partitioning of the space of 


possible outcomes R¢ into M disjoint sets D,,...,Dm: 
M 
LU Pm = R4, (21.17a) 
m=1 
DmADm =%, mAém’, (21.17b) 
where D,, is the set of observations that result in the guessing rule producing 
the guess “M = m.” Conversely, every partition D,,...,Dim of R@ corresponds 


to some deterministic guessing rule that guesses “MZ = m” whenever yops € Dm. 
Searching for an optimal decision rule is thus equivalent to searching for an optimal 
way to partition R?. For every partition D,,...,D the probability of success of 
the guessing rule associated with it is given by 


Pr(correct) = S- Tm Pr(correct | M = m) 


meM 
= ™ ds =m(y) dy 
2 iz Y|M 
= Tm | fyjmem(y)I{y € Dm} dy 
De I, Y|M 
- | ( oS Tm fy|M=m(Y) I{y € Pn) dy. 
e meM 


To minimize the probability of error we maximize the probability of correct deci- 
sion. We thus need to find a partition D;,..., Dm that maximizes the last integral. 


To maximize the integral we shall maximize the integrand 


Tm fy|mam(¥) Hy € Dm}. 


meM 


For a fixed value of y, the value of the integrand depends on the set to which we 
have assigned y. If y was assigned to D, (i.e., if y € D1), then all the terms in the 
sum except for the first are zero, and the value of the integrand is ™ fy);¢=1(y)- 
More generally, if y was assigned to D,,, then all the terms in the sum except for 
the m-th term are zero, and the value of the integrand is tm fyjmom(y). For a 
fixed value of y, the integrand will thus be maximized if we assign y to the set D,;, 
(and correspondingly guess ™), only if 
Tm fyjman(y) = max {tm! Fy|M=m!' (y)}. 

Thus, if @&yegs(*) Satisfies the theorem’s hypotheses, then it maximizes the in- 
tegrand for every y € R¢@ and thus also maximizes the probability of guessing 
correctly. 


21.3.4 The MAP and ML Rules 


As in the binary hypothesis testing case, we can also consider randomized decision 
rules. Extending the definition of a randomized decision rule to our setting, one 
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can show using arguments very similar to those of Section 20.6 that randomization 
does not help: no randomized decision rule can yield a smaller probability of error 
than an optimal deterministic rule. But randomized decision rules can yield more 
symmetric or more “fair” rules. Indeed, we shall define the MAP rule as the 
randomized rule that resolves ties by choosing one of the messages that achieves 
the highest a posteriori probability uniformly at random: 


Definition 21.3.2 (The M-ary MAP Decision Rule). The Maximum A Poste- 
riort decision rule is the guessing rule that, after observing that Y = Yobs, forms 
a guess by picking uniformly at random an element of the set M(Yobs): which is 
defined in (21.14) or (21.15). 


Theorem 21.3.3 (The MAP Rule Is Optimal). For the setting of Theorem 21.3.1 
the MAP decision rule is optimal in the sense that it achieves the smallest proba- 
bility of error among all deterministic or randomized decision rules. Thus, 


p* (error) = S- Tm PMAp(error|M =m), (21.18) 
meEeM 


where p*(error) denotes the optimal probability of error and pmap(error|M = m) 
denotes the conditional probability of error of the MAP rule. 


Proof. Irrespective of the realization of the randomization that is used to pick 
an element of M(Yobs); the resulting decision rule is optimal (Theorem 21.3.1). 
Consequently, the average probability of error that results when we average over 
this source of randomness must also be optimal. 


The Maximum-Likelihood (ML) rule ignores the prior. It is identical to the 
MAP rule when the prior is uniform. Having observed that Y = yops, the ML 
decoder produces as its guess a member of the set 


{in EM: fyjm=m(Yobs) = max, fy|M=m' (Yoos) } 


that is drawn uniformly at random. 


The ML decoder thus guesses “M = m” only if 
fy|m=m(Yobs) = max fy|m=m’ (Yobs)- (21.19) 


(If more than one outcome achieves this maximum, it chooses uniformly at random 
one of the outcomes that achieves the maximum.) 


21.3.5 Processing 


As in Section 20.11, we say that Z is the result of processing Y with respect to M 
if 


forms a Markov chain. In analogy to Theorem 20.11.5, one can prove that if Z is 
the result of processing Y with respect to M, then no decision rule based on Z can 
outperform an optimal decision rule based on Y. 
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(a2, be) 
(az, bs) (a1, b1) 


(aa, ba) (ag, bs) 


(as, bs) (a7, b7) 
(ae, bs) 


Figure 21.1: Eight equiprobable hypotheses; the situation corresponds to 8-PSK. 


21.4 Example: Multi-Hypothesis Testing for 2D Signals 


21.4.1 The Setup 


Consider the case where M is uniformly distributed over the set M = {1,...,M} 
and where we would like to guess the outcome of M based on an observation 
consisting of a two-dimensional random vector Y of components Y and Y@), 
Conditional on M = m, the random variables Y“) and Y®) are independent 
with YY ~ N(am,o7) and Y?) ~ N(bm,o?). We assume that 0? > 0, so the 
conditional densities can be written for every m € M and every y“, y@) € R as 


1 (y® = am)? + (y® = bm)? 
fywy@mem(y¥, y) ae ( 9g2 . (21.20) 


This hypothesis testing problem is related to QAM communication over an additive 
white Gaussian noise channel with a pulse shape that is orthogonal to its time shifts 
by integer multiples of the baud period. The setup is demonstrated in Figure 21.1 
for the special case of M = 8 with 


2 2 
am = Acos( a) bm = Asin(—2*), des (21.21) 


This special case is related to 8-PSK communication, where M-PSK stands for 
M-ary Phase Shift Keying. 


21.4.2. The “Nearest-Neighbor” Decoding Rule 
We shall next derive an optimal decision rule. For typographical reasons we shall 


use y rather than yops to denote the observed vector. To find an optimal decoding 
rule we note that, since M has a uniform prior, the Maximum-Likelihood rule 
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guess | 


m=1 


a 


Figure 21.2: Shaded region corresponds to observations leading the ML rule to 
guess “M = 1.” 


(21.19) is optimal. Now m maximizes the likelihood function if, and only if, 


age = max x fre, y)|M= mi (Y M), y)) )}) 


1 O@-en)?+(v on)? LO -eme) + (eae) 
o2 — —— o2 
Ona? « : mem) Ino? : 


mom)? 4(¥ bn)” OE ea) Rn) 
= max <(e 20? 
m'eEM 


| 

(-" (y — am)? + (y® — bm)? (y) = dm)? + (y® = Bm)? 
S ! 
((v 

( 


> 


t 


t 


t 


ie Arn) oe eel — by) = ? (yD _ dea) a (y®) 7 bm!) \) 


t 


> 


Iv — sal = min {Il ~smh), 


where y = (y), y))", Sm & (am, 0m)! form € M, and ||-|| denotes the Euclidean 
distance (23.4). It is thus seen that the ML rule (which is equivalent to the MAP 
tule because the prior is uniform) is equivalent to a “nearest-neighbor” decoding 
rule, which chooses the hypothesis under which the mean vector is closest to the 
observed vector (with ties being resolved at random). Figure 21.2 depicts the 
nearest-neighbor decoding rule for 8-PSK. The shaded region corresponds to the 
set of observables that result in the guess “M = 1,” ie., the set of points that are 
nearest to (A cos(27/8), A sin(27/8)). 
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2 8 


Figure 21.3: Contour lines of the density fy, y3|17—4(-). Shaded region corresponds 
to guessing “M = 4”. 


21.4.3. Exact Error Analysis for 8-PSK 


The analysis of the probability of error can be a bit tricky. Here we only present 
the analysis for 8-PSK. If nothing else, it will motivate us to seek more easily 
computable bounds. 


We shall compute the probability of error conditional on M = 4. But there is 
nothing special about this choice; the rotational symmetry of the problem implies 
that the probability of error does not depend on the hypothesis. 


Conditional on M = 4, the observables (VY), Y@))™ can be expressed as 
(VY, Y@)T = (-A,0)T + (2M, 2)", 


where Z() and Z() are independent \V (0,07) random variables: 


1 7Q) 24 22) 2 
fzm,z@ 2) ~ Ono? oe ( — = ) een a 


Figure 21.3 depicts the contour lines of the density fy) y))y=4(-), which are 
centered on the mean (a4, b4) = (—A,0). Note that fya) y@)y=a(-) is symmetric 
about the horizontal axis: 


fra yvemaa(y™, -y®)) aa fra ye;maa(y,y™), yy ER. (21.22) 


The shaded region in the figure is the set of pairs (y“, y)) that cause the nearest- 
neighbor decoder to guess “M = 4.”° Conditional on M = 4 an error results if 
(Y™,Y)) is outside the shaded region. 


Referring now to Figure 21.4 we need to compute the probability that the noise 
(Z, Z°)) causes the received signal to lie in the union of the shaded areas. The 


3It can be shown that the probability that the observation lies exactly on the boundary of 
the region is zero; see Proposition 21.6.2 ahead. We shall thus ignore this possibility. 
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Figure 21.4: Error analysis for 8-PSK. 


symmetry of fy) y@ s=4(-) about the horizontal axis (21.22) implies that the 
probability that the received vector lies in the darkly-shaded region is the same as 
the probability that it lies in the lightly-shaded region. We shall thus compute the 
probability of the latter and double the result. 


Let w = 7/8 denote half the angle between the constellation points. To carry out 
the integration we shall use polar coordinates (r,@) centered on the constellation 
point (—A,0) corresponding to Message 4: 


si a ae 2 
pmap(error|M = 4) = 2 | / 5 27 rdrdé 
p(o) 270 


0 
1 Tw ee) 

= -{ / e “dudé 
T JO p?(0)/(207) 
1 


me 620) 
= -{ e 302 dé, (21.23) 
T Jo 
where p(0) is the distance we travel from the point (—A,0) at angle @ until we 
reach the lightly-shaded region, and where the second equality follows using the 
substitution u £ r?/(207). Using the law of sines we have 


Asin wy 
PO) == as: 
sin(™ + ¢) 
Since the symmetry of the problem implies that the conditional probability of error 
conditioned on M = m does not depend on m, it follows from (21.23), (21.24), and 


(21.24) 
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(21.18) that 


1 np _ _ A2 sin? y T 
p* (error) = - | e 2s5in2(6+d)e7 dg, w= 3° (21.25) 
T Jo 


21.5 The Union-of-Events Bound 


Although simple, the Union-of-Events Bound, or Union Bound for short, is an 
extremely powerful and useful bound.* To derive it, recall that one of the axioms 
of probability is that the probability of the union of two disjoint events is the sum 
of their probabilities.” Given two not necessarily disjoint events V and W, we can 
express the set V as in Figure 21.5 as the union of those elements of V that are not 
in W and those that are both in V and in W: 


YV=(V\W)UVNW). (21.26) 


Because the sets V \W and VN W are disjoint, and because their union is V, it 
follows that Pr(V) = Pr(V \ W) + Pr(VN W), which can also be written as 


Pr(V \ W) = Pr(V) — Pr(VAW). (21.27) 
Writing the union VY U W as the union of two disjoint sets 
VUW=WU(V\W) (21.28) 
as in Figure 21.6, we conclude that 
Pr(V UW) = Pr(W) + Pr(V \ W), (21.29) 
which combines with (21.27) to prove that 


Pr(V UW) = Pr(V) + Pr(W) — Pr(V NW). (21.30) 


Since probabilities are nonnegative, it follows from (21.30) that 
Pr(VUW) < Pr(V) + Pr(W), (21.31) 


which is the Union Bound. This bound can also be extended to derive an upper 
bound on the union of more sets. For example, we can show that for three events 
U,V,W we have Pr(“AUVUW) < Pr(/) +Pr(V)+Pr(W). Indeed, by first applying 
the claim to the two sets U and (VU W) we obtain 
Pr(UUVUW) = Pr(UU (VUW)) 
< Pr(U) + Pr(VUW) 
< Pr(YU) + Pr(V) + Pr(W), 


4It is also sometimes called Boole’s Inequality. 
5 Actually the axiom is stronger; it states that this holds also for a countably infinite number 
of sets. 
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y V\wWwW yaw 
= U 


Figure 21.5: Diagram of two nondisjoint sets. 


YVUW W V\w 


Figure 21.6: Diagram of the union of two nondisjoint sets. 


where the last inequality follows by applying the inequality to the two sets V and W. 
One can continue the argument by induction for a finite® collection of events to 
obtain: 


Theorem 21.5.1 (Union-of-Events Bound). Jf V,,V2,..., is a finite or countably 
infinite collection of events then 


Pr(L vy) < S7Pr(v)). (21.32) 


Jj 


We can think about the LHS of (21.32) as the probability that at least one of 
the events V1, V2,... occurs and of its RHS as the expected number of events that 
occur. Indeed, if for each j we define the random variables X;(w) = I{w € V;} for 
all w € Q, then the LHS of (21.32) is equal to Pt[ Xj > Oj, and the RHS is 
>»; E[X;], which can also be expressed as E| Day, X;]. 

After the trivial bound that the probability of any event cannot exceed one, the 
Union Bound is probably the most important bound in Probability Theory. What 
makes it so useful is the fact that the RHS of (21.32) can be computed without 
regard to any dependencies between the events. 


Corollary 21.5.2. 


(i) If each of a finite (or countably infinite) collection of events occurs with prob- 
ability zero, then their union also occurs with probability zero. 


6In fact, this claim holds for a countably infinite number of events. 
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(it) If each of a finite (or countably infinite) collection of events occurs with prob- 
ability one, then their intersection also occurs with probability one. 


Proof. To prove Part (i) we assume that each of the events V1, V2,... is of zero 
probability and compute 


Pr ( U v;) < oy Pr(V;) 
j j 
= SS 0 
j 
= 0, 
where the first inequality follows from the Union Bound, and where the subsequent 


equality follows from our assumption that Pr(V;) = 0, for all 7. 


To prove Part (ii) we assume that each of the events W ,,W.2,... occurs with 
probability one and apply Part (i) to the sets V1, V2,..., where V,; is the set- 
complement of Wj, ie., Vj =2\W;: 


(Mw) = 1-Pe((M%)") 
= 1-Pe( Uv) 


= 15 
where the first equality follows because the probabilities of an event and its com- 
plement sum to one; the second because the complement of an intersection is the 
union of the complements; and the final equality follows from Part (i) because 
the events W; are, by assumption, of probability one so their complements are of 
probability zero. 


21.5.1 Applications to Hypothesis Testing 


We shall now use the Union Bound to derive an upper bound on the conditional 
probability of error pap(error|M = m) of the MAP decoding rule. The bound 
we derive is applicable to any decision rule that satisfies the hypothesis of Theo- 
rem 21.3.1 as expressed in (21.16). 


Define for every m’ 4 m the set Brim: C R* by 


Bm! = {y ER? : tm Fy|Me=m! (y) = fiar-m(y) } : (21.33) 


Notice that y € By m- does not imply that the MAP rule will guess m’: there may 
be a third hypothesis that is a posteriori even more likely than either m or m’. 
Also, since the inequality in (21.33) is not strict, y € Bm: does not imply that 
the MAP rule will not guess m: there may be a tie, which may be resolved in favor 
of m. As we next argue, what is true is that ifm was not guessed by the MAP rule, 
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then some m’ which is not equal to m must have had an a posteriori probability 
that is at least as high as that of m: 


(m was not guessed } => (y € U Brit). (21.34) 


m'Az~m 


Indeed, if m was not guessed by the MAP rule, then some other message was. 
Denoting that other message by m’, we note that Tm fy|im/(y) must be at least 
as large as Tm fyj;sim(y) (because otherwise m’ would not have been guessed), 
so y € Bmim’- 


Continuing from (21.34), we note that if the occurrence of an event €; implies the 
occurrence of an event €2, then Pr(€,) < Pr(€2). Consequently, by (21.34), 


pmap(error|M =m) < Pr Ly E U Bmm | M = m| 
m/Az~m 
=Pr( U {w EN: Y(w) E Bm} [7 =m) 
m'Azém 
< S- Pr({w € 2: ¥(w) € Bnjm}|M =m) 
m'Azém 
= S- Pr/¥ € Bam | M= m| 
m'Azém 
= a i: Fy|m=m(y) dy. 
m'i zm Bam! 


We have thus derived: 


Proposition 21.5.3. For the setup of Theorem 21.3.1 let pwap(error|M = m) 
denote the conditional probability of error conditional on M =m of the MAP rule 
for guessing M based on Y. Then, 


pmap(error|M =m) < S- Pr[Y € Brym | M =m (21.35) 
m/x~m 
=f fasemlyey, (21.36) 
pat Coed 
where 
Baym! = {y ER: tm Fy|m=m(y) 2 Tm firmly) } ; (21.37) 


This bound is applicable to any decision rule satisfying the hypothesis of Theo- 
rem 21.8.1 as expressed in (21.16). 


The term Pr(Y € By m:|M =m) has an interesting interpretation. If ties occur 
with probability zero, then it corresponds to the conditional probability of error 
(given that M = m) incurred by a MAP decoder designed for the binary hypothesis 
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Ba,3 U Bas 
Figure 21.7: Error events for 8-PSK conditional on M = 4. 


testing problem of guessing whether M = m or M = m’ when the prior probability 
that M =m is tm/(tm +m) and that M =m! is tm /(tm + 7m’). 


Alternatively, we can write (21.35) as 


pmap(error|M =m) < Ss Pr[ tm’ fyjmam(Y) > Tm fy|mam(Y) | M= m|. 


m/ Am 


(21.38) 


21.5.2 Example: The Union Bound for 8-PSK 


We next apply the Union Bound to upper-bound the probability of error associated 
with maximum-likelihood decoding of 8-PSK. For concreteness we focus on the 
conditional probability of error, conditional on M/ = 4. We shall see that in this 
case the RHS of (21.35) is still an upper bound on the probability of error even if 
we do not sum over all m’ that differ from m. Indeed, as we next argue, in upper- 
bounding the conditional probability of error of the ML decoder given M = 4, it 
suffices to sum over m’ € {3,5} only. 


To show this we first note that for this problem the set By, of (21.33) corresponds 
to the set of vectors that are at least as close to (dm, bm’) as to (Gm, bm): 


Bao s {y ER?: (yY® — an) + (yY® — dar)” < (yO — an)? + (y® - ba) 


As seen in Figure 21.7, given M = 4, an error will occur only if the observed 
vector Y is at least as close to (a3,b3) as to (a4, 64), or if it is at least as close 
to (a5,b5) as to (a4,64). Thus, conditional on M = 4, an error can occur only if 
Y € By3UByz5. (If Y ¢ Bag U Ba 5, then an error will certainly not occur. If 
Y € By 3U By, 5, then an error may or may not occur. It will not occur in the case 
of a tie—corresponding to Y being on the boundary of B43 U B4,5—provided that 
the tie is resolved in favor of M = 4.) 


Note that the events Y € B45 and Y € 643 are not mutually exclusive, but, 
nevertheless, by the Union-of-Events Bound 


pmap(error|M = 4) < Pr[Y E B43 U Ba5|M = 4] 
< Pr[Y € Bas | = 4] + Pr[Y € Ba5|M _ A], (21.39) 
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where the first inequality follows because, conditional on M = 4, an error can 
occur only if y € By 3 U Bas; and where the second inequality follows from the 
Union-of-Events Bound. In fact, the first inequality holds with equality because, 
for this problem, the probability of a tie is zero; see Proposition 21.6.2 ahead. 


From our analysis of multi-dimensional binary hypothesis testing (Lemma 20.14.1) 
we obtain that 


Pr[Y € Ba 3|M =4] = o( 


= o(“sin(Z) (21.40) 


V (a4 — a3)? + (b4 — a 


and 


Pr[Y € Bas |M = 4] = o( ws OE) EAVES =) 


20 
A. (tm 
= o(“sin(Z)). (21.41) 
Combining (21.39), (21.40), and (21.41) we obtain 
M =4) <20(4si (=) 21.42 
pmap(error|M = 4) < > nls) J- (21.42) 


This is only an upper bound and not the exact error probability because the sets 
B43 and B45 are not disjoint so the events Y € B43 and Y € By 5 are not disjoint 
and the Union-Bound is not tight; see Figure 21.7. 


For this symmetric problem the conditional probability of error conditional on 
M =m does not depend on the message m, and we thus also have by (21.18) 


p’ (error) < 20(4 sin (2)). (21.43) 


21.5.3. Union-Bhattacharyya Bound 


We next derive a bound which is looser than the Union Bound but which is of 
ten easier to evaluate in non-Gaussian settings. It is the multi-hypothesis testing 
version of the Bhattacharyya Bound (20.50). 


Recall that, by Theorem 21.3.1, any guessing rule whose guess after observing that 
Y = Yobs is in the set 


M(Yobs) = {in EM : tim fy|m=m(Yobs) = ae { rm’ Frrint=m(¥oos) } } 


is optimal. To analyze the optimal probability of error p*(error), we shall analyze 
one particular optimal decision rule. This rule is not the MAP rule, but it differs 
from the MAP rule only in the way it resolves ties. Rather than resolving ties at 
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random, this rule resolves ties according to the index of the hypothesis: it chooses 
the message in M(yobs) of smallest index. For example, if the messages of highest 
a posteriori probability are Messages 7, 9, and 17, i.e., if M(yops) = {7, 9,17}, then 
it guesses “7.” This decision rule may not appeal to the reader’s sense of fairness 
but, by Theorem 21.3.1, it is nonetheless optimal. Consequently, if we denote the 
conditional probability of error of this decoder by p(error|M = m), then 


p* (error) = SS Tm p(error|M =m). (21.44) 
meM 


We next analyze the performance of this decision rule. For every m’ 4 m let 


Re . : 
Divi {y € ae Tm! Fy|mam(¥) 2 Tm fy|m=m(y)} : ke <™, (21.45) 
{y € R¢: tm Fy|mem'(¥) > Tm fy|m=m(y)} ifm’ > m. 


Notice that 


Dinm! =D mém. (21.46) 


c 
m’,m? 


Conditional on M = m, our detector will err if, and only if, yobs € Um'¢mPm,m’- 
Thus 


Pr(error|M =m) = Pr Ly E>) Dini |= m| 
aren 
= Pr( U {w € 2: ¥() € Dw} | M =m) 
< © Pel (o EQ: ¥(w) € Dim} | M= m) 
= + Prl[¥Y € Dim |M =m 
us 
oo 13 _ Pristam(y) (21.47) 


where the inequality follows from the Union Bound. To upper-bound p* (error) we 
use (21.44) and (21.47) to obtain 


M 
p* (error) = Se Tm Pr(error|M = m) 


M 

Ss Tm S- | fy|m=m(y) dy 
m= m'xzm Din,m! 
M 

= ye S- (=m f fy|m=m(y) dy + tm f frjar=m(y) ay 
m=1m'>m Dyas mit Dy kan. 
M 

- SS s (/ Tm fyjm=m(y) dy Tv I. Tm! fy|m=m:(Y) ay) 
m=1m'>m m,m! m,m! 
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M 
=< s ps I, min {rm fyjmem (¥), Tm! fim (¥)} dy 


S- Via f V Fvim=m( mY) fy|m=m’(y) dy 


lm’>m 


= & a \/fvinam( m(¥) fy|m=m: (y) dy 
55 > S- mo Eat i i Fviar=m( mY) fy|m=m’(y) dy, 


Pew m' zm 


where the equality in the first line follows from (21.44); the inequality in the second 
line from (21.47); the equality in the third line by rearranging the sum; the equality 
in the fourth line from (21.46); the equality in the fifth line from the definition of 
the set Dm,m’; the inequality in the sixth line from the inequality min{a, b} < Vab, 
which holds for all nonnegative a,b € R (see (20.48)); the inequality in the seventh 
line from the Arithmetic-Geometric Inequality Vcd < (c + d)/2, which holds for 
all c,d > 0 (see (20.49)); and the final equality by the symmetry of the summand. 
We have thus obtained the Union-Bhattacharyya Bound: 


(error) = S> am Am - am a  fviat=m (9) feri=m (y) dy. (21.48) 


meEM m'x~m 


For a priori equally likely hypotheses it takes the form 


p’ (error) < oe S- J Viren m y) fyjm= m’(y) dy, (21.49) 


Mew m!/xzém 


which is the Union-Bhattacharyya Bound for M-ary hypothesis testing with a 
uniform prior. 


21.6 Miulti-Dimensional M-ary Gaussian Hypothesis Testing 


We next use Theorem 21.3.3 to study the multi-hypothesis testing version of the 
problem we addressed in Section 20.14. We begin with the problem setup and then 
proceed to derive the MAP decision rule. We then assess the performance of this 
rule by deriving an upper bound and a lower bound on its probability of error. 


21.6.1 Problem Setup 


A random variable M takes value in the set M = {1,...,M} with a nondegenerate 
prior (21.4). We wish to guess M based on an observation consisting of a random 
column-vector Y taking value in R) whose components are given by Y),..., Y).7 


“Our observation now takes value in RJ and not as before in R?. My excuse for using J instead 
of d is that later, when we refer to this section, d will have a different meaning and choosing J 
here reduces the chance of confusion later on. 


422 Multi-Hypothesis Testing 


For typographical reasons we denote the observed realization of Y by y, instead 
of Yops. For every m € M we have that, conditional on I = m, the components 
of Y are independent Gaussians, with Y0) ~ N(s),o?), where sm is some de- 
terministic vector of J components shh ae a 3) , and where o? > 0. Recalling the 
density of the univariate Gaussian distibution (19.6) and using the conditional in- 
dependence of the components of Y given IM = m, we can express the conditional 
density fy|m=m(y) of the vector Y at every point y = (y )...,y)" in RI as 


J (4) — 5)? 
Fewronte) = (phe ly a L)) (21.50) 


21.6.2 Optimal Guessing Rule 


Using Theorem 21.3.3 we obtain that, having observed y = (y“,...,y)' € RJ, 
an optimal decision rule is the MAP rule, which picks uniformly at random an 
element from the set 


M(y) = {in EM: ta fyjman(y) = max {tm frisramo)}} 


= {in EM: In(™™~ fyjm=m(y)) = max {In (tm Friaram(v))} }. (21.51) 


where the second equality follows from the strict monotonicity of the logarithm. 
We next obtain a more explicit description of M(y) for our setup. By (21.50), 


J 
J 1 
In(am fyjmem(y)) = Intm — 5 in 270") — 352 (yl — 33)? (21.52) 


The term (J/2) In(2707) is a constant term that does not depend on the hypothesis. 
Consequently, it does not influence the set of messages that attain the highest score. 
(The tallest student in the class is the same irrespective of whether the height of all 
the students is measured when they are barefoot or when they are all wearing the 
one-inch heel school uniform shoes. The heel can only make a difference if different 
students wear shoes of different heel height.) Thus, 


. J (i) _ (9)? J (y(a) 3) 
es 57 Wt = sn) = max x {bat yw m)- atl} 


: 202 m’ 
j=1 j=1 


The expression for M(y) can be further simplified if M is a priori uniformly 
distributed. In this case we have 


7 J (y — @))? J (y@ — Gy? 
Fs S y Sin ¥y Sm! 
M(y) = { M: 2 = ax, | y 552 || 
j=l 


a 
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where the first equality follows because when M is uniform the additive term ln 7, 
is given by In(1/M) and hence does not depend on the hypothesis; and where the 
second equality follows because changing the sign of all the elements of a set changes 
the largest ones to the smallest ones, and by noting that scaling the score by 20? 
does not change the highest scoring messages (because we assumed that a? > 0). 


If we interpret the quantity 


J ‘© 
lly — Smll = | \o(y — s?)? 
j=l 


as the Euclidean distance between the vector y and the vector s,,, then we see that, 
for a uniform prior on M, it is optimal to guess the message m whose corresponding 
mean vector s,,, is closest to the observed vector y. Notice that to implement this 
“nearest-neighbor” decision rule we do not need to know the value of o?. 


We next show that if, in addition to assuming a uniform prior on M, we also 
assume that the vectors s1,...,Sm all have the same norm, i.e., 


Ils1|| = ||s2|| =--- = |lsml], (21.53) 


then 
J J 
¥ 7 : ig cx. j) oA) 
Fay) = fe Ms Srv = mae { an} 
j=l j=l 
so the MAP decision rules guesses the message m whose mean vector Sm, has 
the “highest correlation” with the received vector y. To see this, we note that 


because M has a uniform prior the “nearest-neighbor” decoding rule is optimal, 
and we then expand 


g=1 
Dt JO eee 
= SL)? 2 yD + D2 (? 
j=l j=l j=l 


where the first term does not depend on the hypothesis and where, by (21.53), the 
third term also does not depend on the hypothesis. 


We summarize our findings in the following proposition. 


Proposition 21.6.1. Consider the problem described in Section 21.6.1 of guess- 
ing M based on the observation y. 


1) It is optimal to form the guess based on y= i. sey J) by choosin 
unifor mly at random from the set 


J (44) — 5)? EGO 2057 
{ie Mtn yw 2 = mag {nt yw = tn) i 


j=l 
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(ti) If M is uniformly distributed, then this rule is equivalent to the “nearest- 
neighbor” decoding rule of picking uniformly at random an element of the 
set 

{iM : lly — snl| = min {lly — ml} }. 

(iti) If, in addition to M being uniform, we also assume that the mean vectors 
satisfy (21.53), then this rule is equivalent to the “maximum-correlation” rule 
of picking at random an element of the set 


J i J ; 
" EM: Sys) = va {ase} 
j=l 


j=l 


We next show that if the mean vectors s;,...,Sm are distinct in the sense that for 
every pair m’ 4 m” in M there exists at least one component where the vectors 
Sm’ and Sm differ, i.e., 


lS — Sm || > 0, m' Am", 


then the probability of ties is zero. That is, we will show that the probability of 
observing a vector y for which the set M(y) (21.51) has more than one element 
is zero. Stated in yet another way, the probability that the observable Y will be 
such that the MAP will require randomization is zero. Stated one last time: 


Proposition 21.6.2. If the mean vectors s,,...,Sm in our setup are distinct, then 
with probability one the observed vector y is such that there is a unique message of 
highest a posteriori probability. 


Proof. Conditional on Y = y, associate with each message m € M the score 
In(1m fy; m=m(y)). We need to show that the probability of the observation y 
being such that at least two messages attain the highest score is zero. Instead, we 
shall prove the stronger statement that the probability of two messages attaining 
the same score (be it maximal or not) is zero. 


We first show that it suffices to prove that for every m € M and for every pair of 
messages m’ 4 m’’, we have that, conditional on M = m, the probability that m/’ 
and m” attain the same score is zero, i.e., 


Pr(score of Message m’ = score of Message m”|M =m) =0, m! #m", 
(21.54) 
Indeed, once we show (21.54), it will follow that the unconditional probability that 
Message m’ attains the same score as Message m” is zero, i.e., 


Pr(score of Message m’ = score of Message m”) =0, m! 4m", (21.55) 
because 


Pr(score of Message m’ = score of Message m’) 


= y Tm Pr(score of Message m’ = score of Message m” | M =m). 
meM 
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But (21.55) implies that the probability that any two or more messages attain the 
highest score is zero because 


Pr(two or more messages attain the highest score) 


= Po U {m’ and m" attain the highest <n} 
m!,m"EM 


where the first equality follows because more than one message attains the high- 
est score if, and only if, there exist two distinct messages m’ and m” that attain 
the highest score; the subsequent inequality follows from the Union Bound (Theo- 
rem 21.5.1); and the final inequality by noting that if m’ and m” both attain the 
highest score, then they both achieve the same score. 


Having established that in order to complete the proof it suffices to establish 
(21.54), we proceed to do so. By (21.52) we obtain, upon opening the square, 
that the observation Y results in Messages m’ and m” obtaining the same score if, 
and only if, 


1 J . ; ; Tim" 1 
= 2 YO (er ~ Syn) = In + 5 (lm? — [lm ). (21.56) 
We next show that, conditional on M = m, the probability that Y satisfies (21.56) 
is zero. To that end we note that, conditional on M = m, the random variables 
Y®,..., YW are independent random variables with Y% being Gaussian with 
mean 3) and variance a7; see (21.50). Consequently, by Proposition 19.7.3, we 
have that, conditional on M = m, the LHS of (21.56) is a Gaussian random variable 
of variance 


2 

52 lism" — Sm", 
which is positive because m’ # m” and because we assumed that the mean vectors 
are distinct. It follows that, conditional on M = m, the LHS of (21.56) is a 
Gaussian random variable of positive variance, and hence has zero probability of 


being equal to the deterministic number on the RHS of (21.56). This proves (21.54), 
and hence concludes the proof. 


21.6.3. The Union Bound 


We next use the Union Bound to upper-bound the optimal probability of error 
p* (error). By (21.38) 


pmap(error|M =m) < S- Pr[tm/ fy|meam' (Y) > tm fy|m=m(Y) | M= m| 


mi x~m 
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Sia. Sea' || oO Tm 
= l 21. 
5 o( os! ea ee (21.57) 


m/Az~m mm 


where the equality follows from Lemma 20.14.1. From this and from the optimality 
of the MAP rule (21.18) we thus obtain 


|Sin — -Sm'l Oo Tm 
(error) ao ye Tm, o( Bes oles ae In = ht (21.58) 


/ 
meEM m'AzAm vt 


If M is uniform, these bounds simplify to: 


Sm — Sm’ e 
pmap(error|M = m) < s o( Hassel). M uniform, (21.59) 
m'Azém 
p* (error) < 23 Ss o( Basel = Sm ) M uniform. (21.60) 
yy eeu m'Az~m 


21.6.4 A Lower Bound 


We next derive a lower bound on the optimal error probability p* (error). We do so 
by lower-bounding the conditional probability of error pyap(error|M = m) of the 
MAP rule and by then using this lower bound to derive a lower bound on p* (error) 
via (21.18). 

We note that if Message m’ attains a score that is strictly higher than the one 
attained by Message m, then the MAP decoder will surely not guess “M = m.” 
(The MAP may or may not guess “M = m’ depending on the score associated 
with messages other than m and m’.) Thus, for each message m’ 4 m we have 


pmap(error|M = m) > Pr[ ttm: fyjmem/(Y) > tm fyjmam(Y¥) | M =m] (21.61) 
2 o( Ee Sit a = BO = ppc lin : (21.62) 


20 Sim — Sm || Tm! 


where the equality follows from Lemma 20.14.1. 


Noting that (21.62) holds for all m’ 4 m, we can choose m’ to get the tightest 
bound. This yields the lower bound 


\|Sm — Sm! | o Tm 
M= > max 1 21. 
pap (error| m) > escent o( = + ie =e n ah (21.63) 


and hence, by (21.18), 


(error) a ye Tm anak Q Sm = Sm’ | < In ae . (21.64) 
beef EM\{m} 20 Sm —Sm'|| 7m 


For uniform M this expression can be simplified by noting that the Q-function is 
strictly decreasing: 
I|Sm — Sm’ || 


pmap(error|M =m) > o(,, aun, Bn swll), M uniform, (21.65) 
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1 m — 8m! e 
p (error) > M Be o(,, ania. Bnew) M uniform. (21.66) 


21.7 Additional Reading 


For additional reading on multi-hypothesis testing see the recommended reading 
for Chapter 20. The problem of assessing the optimal probability of error for the 
multi-dimensional M-ary Gaussian hypothesis testing problem of Section 21.6 has 
received extensive attention in the coding literature. For a survey of these results 
see (Sason and Shamai, 2006). 


21.8 Exercises 


Exercise 21.1 (Ternary Gaussian Detection). Consider the following special case of the 
problem discussed in Section 21.6. Here M is uniformly distributed over the set {1, 2,3}, 
and the mean vectors s1,S2,S3 are given by 


s1=0, so=s, s3=-s, 


where s is some deterministic nonzero vector in R). Find the conditional probability of 
error of the MAP rule conditional on each hypothesis. 


Exercise 21.2 (4-PSK Detection). Consider the setup of Section 21.4 with M = 4 and 
(a1, 61) = (0, A), (ag, be) = (—A,0), (az, bg) = (0, —A), (a4, ba) = (A, 0). 


(i) Sketch the decision regions of the MAP decision rule. 


(ii) Using the Q-function, express the conditional probabilities of error of this rule 
conditional on each hypothesis. 


(iii) Compute an upper bound on pmap(error|M = 1) using Propsition 21.5.3. Indicate 
on the figure which events are summed two or three times. Can you improve the 
bound by summing only over a subset of the alternative hypotheses? 


Hint: In Part (ii) first find the probability of correct detection. 


Exercise 21.3 (A 7-ary QAM problem). Consider the problem addressed in Section 21.4 
in the special case where M = 7 and where 


m=1,...,6, 


Qm = Acos(=), bac Asin(=2"*), 


6 
a7 = 0, b7 = 0. 


(i) Ilustrate the decision regions of the MAP (nearest-neighbor) guessing rule. 


(ii) Let Z = (Z,Z)" be a random vector whose components are IID N (0,07). 
Show that for every message m € {1,...,7} the conditional probability of error 
pmrap(error|M = m) can be upper-bounded by the probability that the Euclidean 
norm of Z exceeds A/2. Calculate this probability. 
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(iii) What is the upper bound on pyap(error|M = m) that Proposition 21.5.3 yields in 
this case? Can you improve it by including fewer terms? 


(iv) Compare the different bounds. 


See (Viterbi and Omura, 1979, Chapter 2, Problem 2.2). 


Exercise 21.4 (Orthogonal Mean Vectors). Let M be uniformly distributed over the set 
M = {1,...,M}. Let the observable Y be a random J-vector. Conditional on M = m, 
the observable Y is given by 


Y=VEs¢m + Z, 
where Z is arandom J-vector whose components are IID N (0, a”), and where #1,...,@m 


are orthonormal in the sense that 
(bm, Pm!) p =I{m' San hy m',m" eM. 


Show that 


_ (é=a)? 
2 


dé, (21.67) 


pmap(error|M =m) = 1 ue Q(é))M-te 
where a = VE;/o. 


Exercise 21.5 (Equi-Energy Constellations). Consider the setup of Section 21.6.1 with a 
uniform prior and with ||si||” = --- = |/sm||? = Es. Show that the optimal probability of 
correct decoding is given by 


. = tsi Es i 
p (correct) = uM oxP (-33) E lexp (os max (V, smb) | ; (21.68) 


where V is a random J-vector whose components are IID NV (0, a”). We recommend the 
following approach. Let Di,...,Dm be a partition of R! such that for every m € M, 


ye Dn=> (¥;Sm) p = ma (Y; 8m!) p 


(i) Show that 


p (correct) = a oa Pr[Y € Dn | M=ml. 
meM 


(ii) Show that the RHS of the above can be written as 


| PE exp ( lyr) ( s- I{y € Dm} exp (4 .5m)e) J dy. 


meM 


(iii) Finally show that 


S- I{y € D»} exp (4 (¥,8m)s) = exp (4 max (y,8m)p)); yeR’. 
memM 


See also Problem 23.7. 
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Exercise 21.6 (When Is the Union Bound Tight?). Under what conditions on the events 
Vi, V2,... is the Union Bound (21.32) tight? 


Exercise 21.7 (The Union of Independent Events). Show that if the events V1, V2,...,Vn 


are independent then 
p(w) =1- [[@-Pro™,)). 
j=l ga1 


Exercise 21.8 (A Lower Bound on the Probability of a Union). Show that the probability 
of the union of n events V1,..., Yn can be lower-bounded by 


n—-1 n 


pr( Us) > SPx(y) aa a” r(Vj Ve). 


Inequalities of this nature are sometimes called Bonferroni Inequalities. 
Exercise 21.9 (de Caen’s Inequality). Let X be a RV taking value in the finite set ¥, 
and let {Ai}ier be a finite family of subsets (not necessarily disjoint) of ¥: 

AiC#r, icf. 


Define 
Pr(Ai) = Pr[X € Ai], i€Z, 
deg(x) = #{ieT: cE A}, rex, 
where #86 denotes the cardinality of a set B. 


(i) Show that 


P( UA) = yO aa 


1EeL i€LT EA; 


(ii) Use the Cauchy-Schwarz Inequality to show that for every i € Z, 
2 
Pr[X = ‘) ( > 
S- a Pr[X = a] deg(x S- Prix : 
(= deg(x) teEA; LEA; 
(iii) Use Parts (i) and (ii) to show that 


(sea, Pr[X = zl) 
m( UA) ‘ y Dijez Vere ana, PIX = x] 


(iv) Conclude that 


p(U4) 23 So nw): 


tel 1EL 


This is de Caen’s Bound (de Caen, 1997). 


Exercise 21.10 (Asymptotic Tightness of the Union Bound). Consider the hypothesis 
testing problem of Section 21.6 when the prior is uniform and the mean vectors s1,...,SM 
are distinct. Show that the Union Bound of (21.59) is asymptotically tight in the sense 
that the limiting ratio of the RHS of (21.59) to the LHS tends to one as o tends to zero. 


Hint: Use Exercise 21.8. 


Chapter 22 


Sufficient Statistics 


22.1 Introduction 


In layman’s terms, a sufficient statistic for guessing M based on the observable Y 
is arandom variable or a collection of random variables that contains all the infor- 
mation in Y that is relevant for guessing M. This is a particularly useful concept 
when the sufficient statistic is more concise than the observables. For example, if 


we observe the results of a thousand coin tosses Y;,..., Y1o000 and we wish to test 
whether the coin is fair or has a bias of 1/4, then a sufficient statistic turns out 
to be the number of “heads” among the outcomes Y;,..., Yig99.1. Another example 


was encountered in Section 20.12. There the observable was a two-dimensional 
random vector, and the sufficient statistic summarized the information that was 
relevant for guessing H in a scalar random variable; see (20.69). 


In this chapter we provide a formal definition of sufficient statistics in the multi- 
hypothesis setting and explore the concept in some detail. We shall see that our 
definition is compatible with Definition 20.12.2, which we gave for the binary case. 
We only address the case where the observations take value in the d-dimensional 
Euclidean space R¢. Extensions to observations consisting of a stochastic process 
are discussed in Section 26.3. Also, we only treat the case of guessing among a 
finite number of alternatives. We thus consider a finite set of messages 


M = {1,...,M}, (22.1) 


where M > 2, and we assume that associated with each message m € M is a density 
fyjm=m(-) on R’, i.e., a nonnegative Borel measurable function that integrates to 
one. 


The concept of sufficient statistics is defined for the family of densities 
fy|m=m(), meM; (22.2) 


it is unrelated to a prior. But when we wish to use it in the context of hypothesis 
testing we need to introduce a probabilistic setting. If, in addition to the family 


1 Testing whether a coin is fair or not is a more complicated hypothesis testing problem of a 
kind that we shall not address. It falls under the category of “composite hypothesis testing.” 
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{fy|m=m(-)}mem, we introduce a prior {7tm}mem, then we can discuss the pair 
(M,Y), where Pr[M = m] = 7, and where, conditionally on M = m, the dis- 
tribution of Y is of density fyjs=m(-). Thus, once we have introduced a prior 
{tm}mem we can, for example, discuss the density fy(-) of Y as in (21.11) 


fy(y) = S- Tm fyjm=m(y), ye R’, (22.3) 
meM 


and the conditional distribution of M conditional on Y = y as in (21.12) 


Tm fyjm=m(y) if fy(y) > 0, 


Pr{M=m|¥=y]*4 , fy(y) meM, ye R¢. 
M otherwise, 
(22.4) 


22.2 Definition and Main Consequence 


In this section we shall define sufficient statistics for a family of densities (22.2). 
We shall then state the main result about this notion, namely, that there is no loss 
in optimality in basing one’s guess on a sufficient statistic. 

Very roughly, T(-) (or sometimes T(Y)) forms a sufficient statistic for guessing M 
based on Y if there exists a black box that, when fed T(yops) (but not yops) and 
any prior {7,,} on M produces the a posteriori distribution of M given Y = yobs. 


For technical reasons we make two exceptions. While the black box must always 
produce a probability vector, we only require that this vector be the a posteriori 
distribution of M given Y = yo», for observations yop; that satisfy 


SS tm fyina van) 0 (22.5) 
meM 


and that lie outside some prespecified set Yo C R? of Lebesgue measure zero. Thus, 
if Yops is in Vo or if (22.5) is violated, then the output of the black box can be any 
probability vector. The exception set Yo is not allowed to depend on {7}. Since 
it is of Lebesgue measure zero, the conditional probability that the observation Y 
lies in Vo is zero: 

Pr[Y €¥o|M=m]=0, mem. (22.6) 


Note that the black box need not indicate whether yops is in Yo and/or whether 
(22.5) holds. Figure 22.1 depicts such a black box. 


Definition 22.2.1 (Sufficient Statistics for M Densities). We say that a mapping 
T: R¢ = R forms a sufficient statistic for the densities fy|mai()s---> fyjm=m(-) 
on R¢ if it is Borel measurable and if for some Yo C R¢ of Lebesgue measure zero we 
have that for every prior {mm} there exist M Borel measurable functions from Re 
to [0,1] 

T (Yobs) ers Wm({tm}, Ey sue) ME M, 
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Black Box 


{im }meM 


(v1 ({am} ,T(Yobs)), teatory uM ({am} T(yors))) 


Figure 22.1: A black box that when fed any prior {7} and T(yops) (but 
not the observation yops directly) produces a probability vector that is equal to 
(Pr[M = 1]¥Y = yops|,---,Pr[MZ = M|Y = yops])' whenever both the condition 
DS ea Fein (Yops) > 0 and the condition yops ¢ Yo are satisfied. 


such that the vector 


(11 {im} T(vobs))+---s¥m {tm} POVobs))) 


is a probability vector and such that this probability vector is equal to 
T 
(Prim =1/Y =yops],...,Pr[M = MY = Yous] ) (22.7) 


whenever both the condition yoos ¢ Yo and the condition 


M 
ye Tm Fy|M=m(Yobs) > 9 (22.8) 
m=1 


are satisfied. Here (22.7) is computed for M having the prior {tm} and for the 
conditional law of Y given M corresponding to the given densities. 


The main result regarding sufficient statistics is that if T(-) forms a sufficient 
statistic, then—even if the transformation T(-) is not reversible—there is no loss 
in optimality in basing one’s guess on T(Y). 

Proposition 22.2.2 (Guessing Based on T(Y) Is Optimal). [f T: R¢ — R¢ 


is a sufficient statistic for the M densities {fy|m=m(-)}mem, then, given any 
prior {tm}, there exists an optimal decision rule that bases its decision on T(Y). 


Proof. To prove the proposition we shall exhibit a decision rule that is based 
on T(Y) and that mimics the MAP rule based on Y. Since the latter is optimal 
(Theorem 21.3.3), our proposed rule must also be optimal. Let {w,,(-)} be as in 
Definition 22.2.1. Given Y = yops, the proposed decoder considers the set of all 
messages ™m satisfying 


Vin (tah T(Yobs)) _ ane Wm! Cates T(Yobs)) (22.9) 


and picks uniformly at random from this set. 
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We next argue that this decision rule is optimal. To that end we shall show that, 
with probability one, this guessing rule is the same as the MAP rule for guessing 
based on Y. Indeed, the guess produced by this rule is identical to the one produced 
by the MAP rule whenever yops satisfies (22.8) and lies outside Vp. Since the 
probability that Y satisfies (22.8) is, by (21.13), one, and since the probability 
that Y is outside Yo is, by (22.6), also one, it follows from Corollary 21.5.2 that 
the probability that Y satisfies both (22.8) and the condition Y ¢ Vo is also one. 
Thus, the proposed guessing rule, which bases its decision only on T(yops) and 
on the prior has the same performance as the (optimal) MAP decision rule for 
guessing M based on Y. 


22.3. Equivalent Conditions 


In this section we derive a number of important equivalent definitions for sufficient 
statistics. These will further clarify the concept and will also be useful in identifying 
sufficient statistics. We shall try to state the theorems rigorously, but our proofs 
will be mostly heuristic. Rigorous proofs require some Measure Theory that we 
do not wish to assume. For a rigorous measure-theoretic treatment of this topic 
see (Halmos and Savage, 1949), (Lehmann and Romano, 2005, Section 2.6), or 
(Billingsley, 1995, Section 34).? 


22.3.1 The Factorization Theorem 


The following characterization is useful because it is purely algebraic. It explores 
the form that the densities { fyj;s=m(-)} must have for T(Y) to form a sufficient 
statistic. Roughly speaking, T(-) is sufficient if the densities in the family all have 
the form of a product of two functions, where the first function depends on the 
message and on T(y), and where the second function does not depend on the 
message but may depend on y. We allow, however, an exception set Yo C R¢@ of 
Lebesgue measure zero, so we only require that for every m € M 


fy|m=m(y) = 9m(T(y)) hy), y ¢ do. (22.10) 


Note that if such a factorization exists, then it also exists with the additional 
requirement that the functions be nonnegative. Indeed, if (22.10) holds, then by 
the nonnegativity of the densities 
fy|m=m(y) = | fyjar=m(y)| 
= |9m(T(y)) h(y)|, y € Mo 
= |9m(T(y))|Ih@)L, ¥ € No, 


thus yielding a factorization with the nonnegative functions 


{y= |9m(TO))|}mem and yr |h(y)]. 


?Our setting is technically easier because we only consider the case where M is finite and 
because we restrict the observation space to R?. 
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Limiting ourselves to nonnegative factorizations, as we henceforth shall, is helpful 
in manipulating inequalities where multiplication by negative numbers requires 
changing the direction of the inequality. For our setting the Factorization Theorem 
can be stated as follows.® 


Theorem 22.3.1 (The Factorization Theorem). A Borel measurable function 
T: R? = R® forms a sufficient statistic for the M densities { fy|i—m(-)}mem 
on R¢ if, and only if, there exists a set Yo C R® of Lebesgue measure zero and non- 


negative Borel measurable functions g1,...,8M: RY [0,00) and h: R4 — [0, 00) 
such that for everym «eM 
fy|mam(y) = 9m(T(y)) bly), y © R°\ do. (22.11) 


Proof. We begin by showing that if T(-) is a sufficient statistic then there exists a 
factorization of the form (22.11). Let the set Yo and the functions {¢,,(-)} be as in 
Definition 22.2.1. Pick some 71,...,7#m > 0 that sum to one, €.g., 7m = 1/M for 
all m € M, and let M be of the prior {7m}, so Pr[M = m] = 7m for allm € M. 
Let the conditional law of Y given M be as specified by the given densities so, in 
particular, 


fy(y) = >> itm fyimem(y), y ER’. (22.12) 
meM 


Since {7} are strictly positive, it follows from (22.12) that 


(fv(y) = 0) > (fvim=m(y) =0, me M). (22.13) 


(The only way the sum of nonnegative numbers can be zero is if they are all zero. 
Thus, fy(y) = 0 always implies that all the terms {7m fyjmom(y)} are zero. But 
if {7m} are strictly positive, then this implies that all the terms {fyjm—m(y)} are 
zero.) 


By the definition of the functions {4,,(-)} and of the conditional probability (22.4), 
we have for every m € M 


Ym (1,---5 7m, T(Yoos)) = ttm Fyja=m (Vows) 


; (voos € Yo and fy (Yobs) > 0). 


fy (Yobs) 
(22.14) 
We next argue that the densities factorize as 
1 _ = 
fy|m=m(y) = =—m(M1, tee ,itm, T(y)) fy(y), ye R4 \ Yo. (22.15) 
m % , 


gm(T(y)) h(y) 


3A different, perhaps more elegant, way to state the theorem is in terms of probability dis- 
tributions. Let Pm be the probability distribution on R@ corresponding to M = m, where m 
is in the finite set M. Assume that {P,,} are dominated by the o-finite measure yz. Then the 
Borel measurable mapping T’: R¢ > R@ forms a sufficient statistic for the family {Pm} if, and 
only if, there exists a Borel measurable nonnegative function h(-) from R@ to R, and M nonneg- 
ative, Borel measurable functions gm(-) from R® to R such that for each m € M the function 
Y + gm(T(y)) h(y) is a version of the Radon-Nikodym derivative dPm/du of Pm with respect 
to py; see (Billingsley, 1995, Theorem 34.6) and (Lehmann and Romano, 2005, Corollary 2.6.1). 


22.3 Equivalent Conditions 435 


This can be argued as follows. If fy(y) is greater than zero, then (22.15) follows 
directly from (22.14). And if fy(y) is equal to zero, then RHS of (22.15) is equal 
to zero and, by (22.13), the LHS is also equal to zero. 


We next prove that if the densities factorize as in (22.11), then T(-) forms a suffi- 
cient statistic. That is, we show how using the factorization (22.11) we can design 
the desired black box. The inputs to the black box are the prior {7,,} and T(y). 
The black box considers the vector 


(m 1 (T(y)),..-;7m IM (r(y))) (22.16) 


If all its components are zero, then the black box produces the uniform distribution 
(or any other distribution of the reader’s choice). Otherwise, it produces the above 
vector but normalized to sum to one. Thus, if we denote by Wn(71,...,7m,T(y)) 
the probability that the black box assigns to m when fed 7,...,7m and T(y), 
then 


a if saree Tm! Im! (T(y)) = 0, 
Wm(1,---,7M,T(y)) = Tm Im(T(y)) 


otherwise. 
mem Fm’ Im! (T(y)) 


(22.17) 
To verify that tm(m,...,7m,T(y)) = Pr[M = m|Y = y] whenever y is such that 
y ¢ Y and (22.8) holds, we first note that, by the factorization (22.11), 


(fv(y) > Oand y ¢ Yo) = (Hy) 3 wd (PE) > 0), 


so 


M 
(fv(y) >Oandy ¢ Yo) a (ny) >0 and So tm Im (Tly)) > 0). (22.18) 


m'=1 


Consequently, if y ¢ Yo and if (22.8) holds, then by (22.18) & (22.17) 


Ce (m1, A .™,T(y)), ...,UM (m, a sm Tty))) 


is equal to the vector in (22.16) but scaled so that its components add to one. But 
the a posteriori probability vector is also a scaled version of (22.16) (scaled by 
h(y)/fy(y)) that sums to one. Thus, if y € Yo and (22.8) holds, then the vector 
produced by the black box is identical to the a posteriori distribution vector. 


22.3.2 Pairwise sufficiency 


We next clarify the connection between sufficient statistics for binary hypothesis 
testing and for multi-hypothesis testing. We show that T(Y) forms a sufficient 
statistic for the family of densities { fyjmw=m(-)}mem if, and only if, for every pair 
of messages m’ 4 m” in M we have that T(Y) forms a sufficient statistic for the 
densities fy|m=m’'(:) and Feiner’) 
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One part of this statement is trivial, namely, that if T(-) is sufficient for the family 
{fy|m=m(-)}mem then it is also sufficient for any pair. Indeed, by the Factoriza- 
tion Theorem (Theorem 22.3.1), the sufficiency of T(-) for the family implies the 
existence of a set of Lebesgue measure zero Vo C R¢@ and functions {gmbmem, h 
such that for all y € R¢ \ Yo 


fyjmam(¥) = 9m(T(y)) aly), mem. (22.19) 


In particular, if we limit ourselves to m’,m” € M then for y ¢ Yo 
fy|mam’(¥) = 9m'(T(y)) hly), 


fy|mam(¥) = 9m (T(y)) ACY), 
which, by the Factorization Theorem, implies the sufficiency of T(-) for the pair of 
densities fy|yem'(+), fy;m=m"(-)- 
The nontrivial part of the proposition is that pairwise sufficiency implies sufficiency. 
Even this is quite easy when the densities are all strictly positive. It is a bit more 
tricky without this assumption.* 


Proposition 22.3.2 (Pairwise Sufficiency Implies Sufficiency). Consider M den- 
sities {fy|mom(-)}mem on R4, and assume that T: R4 > R@ forms a sufficient 
statistic for every pair of densities fy|m=m'(-), fy|m=m(:), where m' A m" are 
both in M. Then T(-) is a sufficient statistic for the M densities { fy|m—=m(-)}mem- 


Proof. To prove that T(-) forms a sufficient statistic for {fyjam(-) }_1 we shall 
describe an algorithm (black box) that when fed any prior {7,,} and T(yops) (but 
not Yobs) produces an M-dimensional probability vector that is equal to the a 
posteriori probability distribution vector 


(Pr[M =1] ¥ =yous],-..,Pr[M=M|Y¥ = yors)) (22.20) 


whenever Yops € R® is such that 


M 
Yoos Mo and S- tmfy|a=m(¥Yoos) > 9, (22.21) 


m=1 
where Yo is a subset of R? that does not depend on the prior {7,,} and that is of 
Lebesgue measure zero. 


To describe the algorithm we first use the Factorization Theorem (Theorem 22.3.1) 
to recast the proposition’s hypothesis as saying that for every pair m’ 4m” in M 


there exists a set yen Cc R?@ of Lebesgue measure zero and there exist non- 
negative functions g™"), g™-™'). Ra’ _, R and hi”): R¢ . R such that 


Fyimem(y) = gf") (T(y)) RO (y), yeRE\ Vor), (22.22a) 


4This result does not extend to the case where the random variable M can take on infinitely 
many values. 
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Fetaamn(y) = 9h (T(y)) RMD), ye RA Wr, (22.22) 
Let a 
Da. (elt sogee (22.23) 
m’,m"EM 
m' zm" 
and note that, being the union of a finite number of sets of Lebesgue measure zero, 
Vo is of Lebesgue measure zero. 


We now use the above functions gi” es ral ) to describe the algorithm. Note 


that Yops is never fed directly to the algorithm; only T(y ops) is used. Let the prior 
Tm = Pr[M=mj, mem (22.24) 


be given, and assume without loss of generality that it is nondegenerate in the 
sense that 
Tm >0, mem. (22.25) 


(If that is not the case, we can set the black box to produce 0 in the coordinates 
of the output vector corresponding to messages of prior probability zero and then 
proceed to ignore such messages.) Let yops € R®@ be arbitrary. 


There are two phases to the algorithm. In the first phase the algorithm produces 
some m* € M whose a posteriori probability is guaranteed to be positive when- 
ever (22.21) holds. In fact, if (22.21) holds, then no message has an a posteriori 
probability higher than that of m* (but this is immaterial to us because we are 
not content with showing that from T(yops) we can compute the message that a 
posteriori has the highest probability; we want to be able to compute the entire 
a posteriori probability vector). In the second phase the algorithm uses m* to 
compute the desired a posteriori probability vector. 


The first phase of the algorithm runs in M steps. In Step 1 we set m([l] = 1. In 
Step 2 we set 


: mg (T(¥Yobs)) 
1 if 3) 

m{2] i 295” (T(Yobs)) 
2 otherwise. 


And in Step v for v € {2,...,M} we set 


ae 


(m[v—1],v) 
mip — 1] if meat Pray OR) 
mv] = ty girl’) (T(¥obs)) 2 (22.26) 


Vy otherwise. 


Here we use the convention that a/0 = +00 whenever a > 0 and that 0/0 = 1. We 
complete the first phase by setting 


m* = m/([MI. (22.27) 
In the second phase we compute the vector 
Tm gee ) (T(yYobs)) 


alm] = - », mem. (22.28) 
ie Hee OE" (T(¥ons)) 
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If at least one of the components of a[-] is +oo, then we produce as the algorithm’s 
output the uniform distribution on M. (The output corresponding to this case is 
immaterial because it will turn out that this case is only possible if yong is such 
that either yous € Yo or D0, Mm fy|M=m(Yobs) = 0, in which case the algorithm’s 
output is not required to be equal to the a posteriori distribution.) Otherwise, the 
algorithm’s output is the vector 


r 
ONE | SM 22.29 
(sty st) pee 


Having described the algorithm, we now proceed to prove that it produces the 
a posteriori probability vector whenever (22.21) holds. We need to show that if 
(22.21) holds then 


alm 


Pr{M = m|Y = yobs| Su a] 


, memM. (22.30) 


Since there is nothing to prove if (22.21) does not hold, we shall henceforth assume 
for the rest of the proof that it does. In this case we have by (22.4) 


Tm fyjm=m(Yobs) 


Pr[M = m|Y = yobs] fats) 


, meM. (22.31) 


We shall prove (22.30) in two steps. In the first step we show that the result m* 
of the algorithm’s first phase satisfies 


Pr[M = m*|Y¥ = yons] > 0. (22.32) 
To establish (22.32) we shall prove the stronger statement that 


Pr[M =m 


Y = Yous] = max Pr[M = m| ¥ = yoos]. (22.33) 


This latter statement follows from the more general claim that for any v € M (and 
not only for v = M) we have, subject to (22.21), 


Pr[M =mly| | Y= Yobs| — a Pr [M =m | Y= Vovel 4 (22.34) 
For v = 1, Statement (22.34) is trivial. For 2<v <M, (22.34) follows from 


Pr[M =mlv] | Y = Yate = 
max{ Pr[M =p | Y= Yoos|,Pr|M =miy—]] | Y= Yobs| \. (22.35) 


which we now prove. We prove (22.35) by considering two cases separately depend- 
ing on whether Pr[M = v| Y = yops| and Pr[M = m[v—1]|Y = yobs] are both zero 
or not. In the former case there is nothing to prove because (22.35) holds irrespec- 
tive of whether (22.26) results in mv] being set to v or to m[v — 1]. In the latter 
case we have by (22.31) and (22.25) that fyjmw—=v(Yoos) and fy|m=miv—1](Yobs) 
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are not both zero. Consequently, by (22.22), in this case h(™l’—4-”) (yo45) is not 
only nonnegative but strictly positive. It follows that the choice (22.26) guarantees 
(22.35) because 
ttedio-aj Io cai) (Eobe)) 
Ty geht) (T(Yobs)) 
Tm[vp— : gl aC me (T( Yobs Nh pele (Yobs) 
ry gf (m[v—1],v) (T (Yow s)) pore lj,v ”) (Yobs) 
Tm|[v— : ge ar “LT (Yobs) Nh Galea ty ”) (yops) / fy (Yoos) 
nS (m[v—1],v) (T(¥o bs)) pore lj,v ”) (Yops)/ Fy (Yoos) 
7 Pr[M = m[v — 1] [ay usl 
“. Pr[M = v]|Y = yobs] 


where the first equality follows because ;("l’-4)-”)(y,,5) is strictly positive; the 
second because in this part of the proof we are assuming (22.21); and where the 
last equality follows from (22.22) and (22.31). This establishes (22.35), which 
implies (22.34), which in turn implies (22.33), which in turn implies (22.32), and 
thus concludes the proof of the first step. 


In the second step of the proof we use (22.32) to establish (22.30). This is 
straightforward because, in view of (22.31), we have that (22.32) implies that 
fy|m=m-(Yobs) > 0 so, by (22.22b), we have that 


A™™") (yobs) > 0, ME M, 


g m "(Yobs) > 0, ME M. 
Consequently 
(m,m*) 
m Im obs 
alm] = ie (T( (Yoos) 


) 
me 95") (T(yons)) 
ime Nat (Yobs)) AO&™™") (yobs) 

Time 9) (T(Yops)) h&™™") (Yors) 
™I™ Le T(Yops)) AC™™”) (Yoos)/ fy (Yoos) 


Tim* gh oe (T'(Yoos)) h(m, ™*) (Yoos)/fy (Yobs) 
_ Pr[M = m|¥ = yobs] 
~ Pr[M = m*|¥ = yobs]’ 


from which (22.30) follows by (22.32). 


22.3.3. Markov Condition 


We now characterize sufficient statistics using Markov chains and conditional in- 
dependence. These concepts were introduced in Section 20.11. The key result we 
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ask the reader to recall is Theorem 20.11.3. We rephrase it for our present setting 
as follows. 


Proposition 22.3.3. The statement that M—o—T(Y)—-—Y forms a Markov chain 
is equivalent to each of the following statements: 


(a) The conditional distribution of M given (T(Y), Y) is the same as given T(Y). 
(b) M and Y are conditionally independent given T(Y). 
(c) The conditional distribution of Y given (M,T(Y)) is the same as given T(Y). 


Statement (a) can also be written as: 
(a’) The conditional distribution of M given Y is the same as given T(Y). 


Indeed, the conditional distribution of any random variable—in particular M— 
given (T(Y), Y) is the same as given Y only, because T(Y) carries no information 
that is not in Y. 


Statement (a’) can be rephrased as saying that the conditional distribution of 
given Y can be computed from T(Y). Since this is the key requirement of sufficient 
statistics, we obtain: 


Proposition 22.3.4. A Borel measurable function T: R4 > R? forms a sufficient 
statistic for the M densities { fy|m=m(-)}mem tf, and only if, for any prior {tm} 


M-—T(Y)——-Y (22.36) 


forms a Markov chain. 


Proof. The proof of this proposition is omitted. It is not difficult, but it requires 
some measure-theoretic tools.° 


Using Proposition 22.3.4 and Proposition 22.3.3 (cf. (b)) we obtain that a Borel 
measurable function T(-) forms a sufficient statistic for guessing M based on Y if, 
and only if, for any prior {7,,} on M, the message M and the observation Y are 
conditionally independent given T(Y). 


We next explore the implications of Proposition 22.3.4 and the equivalence of the 
Markovity M—c—T(Y)—c—Y and Statement (c) in Proposition 22.3.3. These imply 
that a Borel measurable function T(-) forms a sufficient statistic if, and only if, the 
conditional distribution of ¥ given (T(Y), M =m) is the same for all m € M. Or, 
in other words, a Borel measurable function T(-) forms a sufficient statistic if, and 
only if, the conditional distribution of Y given T(Y) does not depend on which 
of the densities in {fyj=m(-)} governs the law of Y. This characterization has 
interesting implications regarding the possibility of simulating observables. These 
implications are explored next. 


°If T(-) forms a sufficient statistic, then by Definition 22.2.1 Wm({am},T(Y)) is a version of 
the conditional probability that IZ = m conditional on the o-algebra generated by Y, and it is 
also measurable with respect to the o-algebra generated by T(Y). The reverse direction follows 
from (Lehmann and Romano, 2005, Lemma 2.3.1). 
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Given Rule for Guessing 
M based on Y 


Py |T(¥)=T (yop) 


Random Number 
Generator 


Figure 22.2: If T(Y) forms a sufficient statistic for guessing M based on Y, then— 
even though Y cannot typically be recovered from T(Y )—the performance of any 
given detector based on Y can be achieved based on T(Y) and a local random 
number generator as follows. Using T(yops) and local randomness 0, one produces 
a Y whose conditional law given M = mis the same as that of Y, for each m € M. 
One then feeds Y to the given detector. 


22.3.4 Simulating Observables 


For T(Y) to form a sufficient statistic, we do not require that T(-) be invertible, i.e., 
that Y be recoverable from T(Y). Indeed, the notion of sufficient statistics is most 
useful when this transformation is not invertible, in which case T(-) “summarizes” 
the information in the observation Y that is needed for guessing MM. Nevertheless, 
as we shall next show, if T(Y) forms a sufficient statistic, then from T(Y) we 
can produce (using a local random number generator) a vector Y that appears 
statistically like Y in the sense that the conditional law of Y given M is identical 
to the conditional law of Y given M. 


To expand on this, we first explain what we mean by “we can produce ... Y” and 
then elaborate on the consequences of the vector Y having the same conditional 
law given M = mas Y. By “producing” Y from T(Y) we mean that Y is the 
result of processing T(Y) with respect to M. Stated differently, for every t € Re 
there corresponds a probability distribution Py, (not dependent on m) that can 


be used to generate Y as follows: having observed T(Yobs), we use a local random 
number generator to generate the vector Y according to the distribution Pots 
where t = T(yobs); see Figure 22.2. 


By Y appearing statistically the same as Y we mean that the conditional law of Y 
given M = m is the same as that of Y, i.e., is of density fy)y—m(-). Consequently, 
anything that can be learned about M from Y can also be learned about M from Vs 
Also, any guessing device that was designed to guess M based on the input Y will 
yield the same probability of error when, instead of being fed Y, it is fed Y.T hus, 
if p(error|M = m) is the conditional error probability associated with a guessing 
device that is fed Y, then it is also the conditional probability of error that will be 
incurred by this device if, rather than Y, it is fed Y; see Figure 22.2. 
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Before stating this as a theorem, let us consider the following simple example. 
Suppose that our observation consists of d random variables Yj,..., Yq and that, 
conditional on H = 0, these random variables are IID Bernoulli(pg), i-e., they 
each take on the value 1 with probability pp and the value 0 with probability 
1 — po. Conditional on H = 1, these d random variables are IID Bernoulli(p,). 
Here 0 < po, pi < 1 and po 4 pi. Consequently, the conditional probability mass 
functions are 


d 
Py, ,...,¥a|H=0(Yts +++) Ya) = [[@? (1 — po)'~) 


j=l 
= pho Y (1 — py) Eder 
and . 
Py,,...YalH=1Uis-+-sYa) = Pye (1 — py) Ea, 
so P(Yiseir va) = es Y; forms a sufficient statistic by the Factorization The- 
orem.® From T(y1,..-,Yq) one cannot recover the sequence y;,..., Yq. Indeed, 


specifying that T(y1,...,ya) = t does not determine which of the random vari- 
ables is one; it only determines how many of them are one. There are thus es) 
possible outcomes (y1,...Ya) that are consistent with T(y1,..., ya) being equal to t. 
We leave it to the reader to verify that if we use a local random number genera- 
tor to pick one of these outcomes uniformly at random then the result (4, Pek Yq) 
will have the same conditional law given H as (Yi,..., Ya). We do not, of course, 
guarantee that (Yi,...Ya) be identical to (Yi,...,¥a). (The transformation T(-) 
is, after all, not reversible.) 


For additional insight let us consider our example of (20.66). For T(y1,y2) = yj +45 
we can generate Y from a uniform random variable 0 ~ U/ ([0,1)) as 


Y, = /T(Y) cos (270) 


Y2 = \/T(Y) sin(270). 


That is, after observing T(yops) = t, we generate (Y, Yo) uniformly over the tuples 
that are at radius //t from the origin. 


This last example also demonstrates the difficulty of stating the result. The random 
vector Y in this example has a density, both when conditioned on H = 0 and when 
conditioned on H = 1. The same applies to the random variable T(Y). However, 
the distribution that is used to generate Y from T(Y) is neither discrete nor has 
a density. All its mass is concentrated on the circle of radius vt, so it cannot have 
a density, and it is uniformly distributed over that circle, so it cannot be discrete. 


Theorem 22.3.5 (Simulating the Observables from the Sufficient Statistic). Let 
T: R¢ = R® be Borel measurable and let fyjm=i(:)s---s fyjm=m(-) be M densities 


on R¢. Then the following two statements are equivalent: 


(a) T(-) forms a sufficient statistic for the given densities. 


6For illustration purposes we are extending the discussion here to discrete distributions. 
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(b) To every t in R@ there corresponds a distribution on R@ such that the fol- 
lowing holds: for every m € {1,...,M}, if Y = Yoos is generated according 
to the density fy|m=m(:) and if the random vector Y is then generated ac- 
cording to the distribution corresponding to t, where t = T(Yops), then Y is 
of density fy|m—=m(-)- 


Proof. For a measure-theoretic statement and proof see (Lehmann and Romano, 
2005, Theorem 2.6.1). Here we only present some intuition. Ignoring some of the 
technical details, the proof is very simple. The sufficiency of T(-) is equivalent 
to M—c—T(Y)-—e—Y forming a Markov chain for every prior on M. This latter 
condition is equivalent by Proposition 22.3.3 (cf. (c)) to the conditional distribution 
of Y given (T(Y), M) being the same as given T(Y) only. This latter condition 
is equivalent to the conditional distribution of Y given T(Y) not depending on 
which density in the family {fy|iz=m(-)}mem was used to generate Y, i.e., to the 
existence of a conditional distribution of Y given T(Y) that does not depend on 
meM. 


22.4 Identifying Sufficient Statistics 


Often a sufficient statistic can be identified without having to compute and factorize 
the conditional densities of the observation. A number of such cases are described 
in this section. 


22.4.1 Invertible Transformation 


We begin by showing that, ignoring some technical details, any invertible transfor- 
mation forms a sufficient statistic. It may not be a particularly helpful sufficient 
statistic because it does not “summarize” the observation, but it is a sufficient 
statistic nonetheless. 


Proposition 22.4.1 (Reversible Transformations Yield Sufficient Statistics). Jf 
T: R¢ — R® is Borel measurable with a Borel measurable inverse, then T(-) forms 
a sufficient statistic for guessing M based on Y. 


Proof. We provide two proofs. The first uses the definition. We need to verify that 
from T(yops) one can compute the conditional distribution of M given Y = yobs. 
This is obvious because if t = T(yops), then one can compute Pr[M = m|Y = yobs| 
from t by first applying the inverse T~+(t) to recover yops and by then substituting 
the result in the expression for Pr[M = m|Y = yops] (21.12). 


A second proof can be based on Proposition 22.3.4. We need to verify that for any 
prior {7m} 

M-o—T(Y)—-—-Y 
forms a Markov chain. To this end we note that, by Theorem 20.11.3, it suffices 
to verify that M and Y are conditionally independent given T(Y). This is clear 
because the invertibility of T(-) guarantees that, conditional on T(Y), the random 
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vector Y is deterministic and hence independent of any random variable and a 
fortiori of M. 


22.4.2 A Sufficient Statistic Is Computable from the Statistic 


Intuitively, we think about T(-) as forming a sufficient statistic if T(Y) contains 
all the information about Y that is relevant to guessing M. For this intuition to 
make sense it had better be the case that if T(-) forms a sufficient statistic for 
guessing M based on Y, and if T(Y) is computable from S(Y), then S(-) also 
forms a sufficient statistic. Fortunately, this is so: 


Proposition 22.4.2. Suppose that a Borel measurable mapping T: R¢ > Re forms 
a sufficient statistic for the M densities { fy|m=m(-)}mem on R*. Let the mapping 
S:R¢— R be Borel measurable. If T(-) can be written as the composition wo S$ 
of S with some Borel measurable function w: Re R@, then S(-) also forms a 
sufficient statistic for these densities. 


Proof. We need to show that Pr[M = m|Y = yops] is computable from S(yobs). 
This follows because, by assumption, T(yobs) is computable from S(yops) and 
because the sufficiency of T(-) implies that Pr[M = m|Y = yobs] is computable 
from T(Yobs)- 


22.4.3 Establishing Sufficiency in Two Steps 


It is sometimes convenient to establish sufficiency in two steps: in the first step 
we establish that T(Y) is sufficient for guessing M based on Y, and in the second 
step we establish that S(T) is sufficient for guessing M based on T(Y). The 
next proposition demonstrates that it then follows that S(T(Y)) forms a sufficient 
statistic for guessing M based on Y. 


Proposition 22.4.3. [fT: R¢ — R” forms a sufficient statistic for the M densities 
{fy|m=m(-)}mem and if S: RY” — R® forms a sufficient statistic for the corre- 
sponding family of densities of T(Y), then the composition SoT forms a sufficient 
statistic for the densities { fy,m=m(-)}mem- 


Proof. We shall establish the sufficiency of SoT by proving that for any prior {7,,} 
M-—S(T(Y)) —o—Y. 
This follows because for every m € M and every yobs € R¢ 


Pr[M =™ | S(T(Y)) = S(T(Yobs)) | = Pr[M =m | T(Y) a T(Yoos)] 
Pr [M m | Y Vous |} 


where the first equality follows from the sufficiency of S(T(Y)) for guessing 
based on T(Y), and where the second equality follows from the sufficiency of T(Y) 
for guessing M based on Y. 
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22.4.4 Guessing whether V/ Lies in a Given Subset of M 


We motivate the next result with the following example, which arises in the detec- 
tion of PAM signals in white Gaussian noise (Section 28.3 ahead). Suppose that 
the distribution of the observable Y is determined by the value of a k-tuple of bits 
(D,,..., Dx). Thus, to each of the 2* values that the k-tuple (D1,..., Dz) can take, 
there corresponds a distribution on Y of some given density fy)p,=a,,...,D,=d, (“)- 
Suppose now that 7(-) forms a sufficient statistic for this family of M = 2° den- 
sities. The result we next describe guarantees that T(-) is also sufficient for the 
binary hypothesis testing problem of guessing whether a specific bit D,; is zero or 
one. More precisely, we shall show that if {7(a,,...,a,)} is any nondegenerate prior 
on the 2* different k-tuples of bits, then T(-) forms a sufficient statistic for the two 
densities 


Proposition 22.4.4 (Guessing whether M Is in K). Let T: R4¢ > R® form a 
sufficient statistic for the M densities { fy|m=m(-)}mem.- Let the set K C M be a 
nonempty strict subset of M. Let {tm} be a prior on M satisfying 


0< Sa = 1. 
meK 
Then T(-) forms a sufficient statistic for the two densities 


yo tm fyjmem(y) and ys D> tm fyjmam(¥): (22.37) 
meK m€éK 


Proof. By the Factorization Theorem it follows that the sufficiency of T(-) for the 
family { fyjat=m(-)}mem is equivalent to the condition that for every m € M and 


for every y ¢ Vo 
fy|m=m(¥) = 9m(T(y)) Aly), (22.38) 


where the set Yo C R¢ is of Lebesgue measure zero; where {gm(-)}mem are non- 
negative Borel measurable functions from R?; and where h(-) is a nonnegative 
Borel measurable function from R?. Consequently, 


S- Tn fy|m=m(Y) = = Tm Im (T(y)) h(y) 


mek meK 


a: ( tm m(T(y))) h(y), yE€No, (22.39a) 


meK 
and 


So tm fyjmam(¥) = >> %m Im(T(y)) aly) 


mek mE€éK 


= ( do Tm m(T(v))) hiy), y¢ Do. (22.39b) 


m€K 
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The factorization (22.39) of the densities in (22.37) proves that T’(-) is also sufficient 
for these two densities. 


Note 22.4.5. The proposition also extends to more general partitions as follows. 
Suppose that T(-) is sufficient for the family {fyjm=m(-)}mem. Let K1,...,K, be 
disjoint nonempty subsets of M whose union is equal to M, and let the prior {7,,} 


be such that 
S- Tm >O0, gj € {1,..., 4}. 
meK; 


Then T‘(-) is sufficient for the « densities 


A es >» Tm fy|mam(¥)) 2.5 ¥ S- T™m fy|m=m(Y)- 


meK, meK,, 


22.4.5 Conditionally Independent Observations 


Our next result deals with a situation where we need to guess M based on two 
observations: Y; and Y2. We assume that T;(Y1) forms a sufficient statistic for 
guessing M when only Y, is observed, and that T>(Y2) forms a sufficient statistic 
for guessing M when only Y2 is observed. It is tempting to conjecture that in 
this case the pair (T,(Y1),72(Y2)) must form a sufficient statistic for guessing 
when both Y; and Y2 are observed. But, without additional assumptions, this is 
not the case. An example where this fails can be constructed as follows. Let M 
and Z be independent with M taking on the values 0 and 1 equiprobably and with 
Z ~N(0,1). Suppose that Y; = M+Z and that Y2 = Z. In this case the invertible 
mapping 7;(Y1) = Yi forms a sufficient statistic for guessing M based on Yj alone, 
and the mapping 7T2(Y2) = 17 forms a sufficient statistic for guessing M based 
on Y2 alone (because M and Z are independent). Nevertheless, the pair (Yi, 17) is 
not sufficient for guessing MM based on the pair (Yi, Y2). Basing one’s guess of M on 
(Yi, 17) is not as good as basing it on the pair (Yi, Y2). (The reader is encouraged 
to verify that Y; — Yo is sufficient for guessing M based on (Yi, Y2) and that M 
can be guessed error-free from Y; — Y2.) 


The additional assumption we need is that Y; and Y2 be conditionally independent 
given M. (It would make no sense to assume that they are independent, because 
they are presumably both related to M.) This assumption is valid in many appli- 
cations. For example, it occurs when a signal is received at two different antennas 
with the additive noises in the two antennas being independent. 


Proposition 22.4.6 (Conditionally Independent Observations). Let the mapping 
T: R&® 6 R% form a sufficient statistic for guessing M based on the observation 
Y, €R%, and let T: R®  R® form a sufficient statistic for guessing M based on 
the observation Y2 € R®. If Y, and Y»2 are conditionally independent given M, 
then the pair (Ti(Y1), T2(Y2)) forms a sufficient statistic for guessing M based on 
the pair (Y, Y2). 


Proof. The proof we offer is based on the Factorization Theorem. The hypothesis 
that T,: R¢ — R% forms a sufficient statistic for guessing M based on the obser- 


and h®) and 


vation Y, implies the existence of nonnegative functions { gi) ie ae 
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a subset yo C R® of Lebesgue measure zero such that 


fyyimam(¥1) = 9 (Tilyr)) AP(yn), MEM, yr EI). (22.40) 
Similarly, the hypothesis that T>(-) is sufficient for guessing M based on Y2 im- 


and h() and a subset of 


plies the existence of nonnegative functions cn ae ee 


Lebesgue measure zero y?) Cc R® such that 


fyz|M=m(Y2) = g) (T2(y2)) h(y2), meEM, yo¢ ye (22.41) 
The conditional independence of Y; and Y2 given M implies” 
Fyay¥o|M=m(¥1, 92) = fyipm=m(¥1) fy2|mem(¥2), 
meéeM, y1€R®, yo €R®. (22.42) 
Combining (22.40), (22.41), and (22.42), we obtain 


Prey ¥o|M=m (1592) = Gow (Ti(¥1)) Ime? (Ta(y2)) RO (yr) hO (yo), 
—_$$_—__ 2 er 
Im (T1(y1),T2(y2)) h(y1,y2) 


meM, yi EY, yo EY). (22.43) 


The set of pairs (yi, y2) € R@ x R® for which y;, is in yo and/or ye is in yo?) 
is of Lebesgue measure zero, and consequently, the factorization (22.43) implies 
that the pair (T\(¥1),72(Y2)) forms a sufficient statistic for guessing M based on 
(Y1, Yo). 


22.5 Irrelevant Data 


Closely related to the notion of sufficient statistics is the notion of irrelevant data. 
This notion is particularly useful when we think about the data as consisting of 
two parts. Heuristically speaking, we say that the second part of the data is 
irrelevant for guessing M given the first, if it adds no information about M that 
is not already contained in the first part. In such cases the second part of the 
data can be ignored. It should be emphasized that the question whether a part 
of the observation is irrelevant depends not only on its dependence on the random 
variable to be guessed but also on the other part of the observation. 


Definition 22.5.1 (Irrelevant Data). We say that R is irrelevant for guessing M 
given Y, if Y forms a sufficient statistic for guessing M based on (Y, R). 


Equivalently, R is irrelevant for guessing M given Y, if for any prior {7,,} on M@ 


M---Y-—(Y, R), (22.44) 


M—.—Y—o—R. (22.45) 


“Technically speaking, this must only hold outside a set of Lebesgue measure zero, but we do 
not want to make things even more cumbersome. 
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Example 22.5.2. Let H take on the values 0 and 1, and assume that, conditional 
on H = 0, the observation Y is NV (0, 08) and that, conditional on H = 1, it is 
N (0, at). Rather than thinking of this problem as a decision problem with a single 
observation, let us think of it as a decision problem with two observations (Yj, Y2), 
where Yj is the absolute value of Y, and where Y2 is the sign of Y. Thus Y = Yj Yo, 
where Y; > 0 and Y2 € {+1,-—1}. (The probability that Y = 0 is zero under each 
hypothesis, so we need not define the sign of zero.) We now show that Y2 (= the 
sign of Y) is irrelevant data for guessing H given Y; (= the magnitude of Y). Or, 
in other words, the magnitude of Y is a sufficient statistic for guessing H based on 
(Yi, Y2). Indeed the likelihood-ratio function 


fy. ¥2|H=0(Y1, Y2) 


LR(y1, = 
(v1 y2) fy, ,¥o|H=1(Y1, Y2) 


— 1 e (y1y2) 
/ 210? P 207 
O1 yj yy 


—gge (333 208) 


can be computed from the magnitude y; only, so Yi is a sufficient statistic for 
guessing H based on (Yj, Yo). 


The following two notes clarify that the notion of irrelevance is different from that 
of statistical independence. Neither implies the other. 


Note 22.5.3. A RV can be independent of the RV that we wish to guess and yet 
not be irrelevant. 


Proof. We provide an example of a RV R that is independent of the RV HA that 
we wish to guess and that is nonetheless not irrelevant. Suppose that H takes on 
the values 0 and 1, and assume that under both hypotheses Y ~ Bernoulli(1/2): 


Prl/¥ =| =0) ]Prly =1/ f= 1 _ 


Further assume that under H = 0 the RV R is given by OB Y = Y, whereas under 
H =1 it is given by 1 @ Y. Here @ denotes the exclusive-or operation or mod-2 
addition. 


The distribution of R does not depend on the hypothesis; it is Bernoulli(1/2) both 
conditional on H = 0 and conditional on H = 1. But R is not irrelevant for 
guessing H given Y. In fact, if we had to guess H based on Y only, our probability 
of error would be 1/2. But if we base our decision on Y and R, then our probability 
of error is zero because 


H=YoOR. 


Note 22.5.4. A RV can be irrelevant even if it is statistically dependent on the 
RV that we wish to guess. 
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Proof. As an example, consider the case where R is equal to Y with probability one 
and that Y (and hence also R) is statistically dependent on the RV M that we wish 
to guess. Since R is deterministically equal to Y, it follows that, conditional on Y, 
the random variable R is deterministic. Consequently, since a deterministic RV is 
independent of every RV, it follows that M and R are conditionally independent 
given Y, ie., that (22.45) holds. Thus, even though in this example R is statistically 
dependent on M, it is irrelevant for guessing M given Y. The intuitive explanation 
is that, in this example, R is irrelevant for guessing M given Y not because it 
conveys no information about M (it does!) but because it conveys no information 
about M that is not already conveyed by Y. 


Condition (22.44) is often difficult to establish directly, especially when the distri- 
bution of the pair (R, Y) is specified in terms of its conditional density given M, 
because in this case the conditional law of (MM, R) given Y can be unwieldy. In 
some cases the following proposition can be used to establish that R is irrelevant. 


Proposition 22.5.5 (A Condition that Implies Irrelevance). Suppose that the con- 
ditional law of R given M =m does not depend on m and that, for each m € M, 
we have that, conditionally on M =m, the observations Y and R are independent. 
Then R is irrelevant for guessing M given Y. 


Proof. We provide the proof for the case where the pair (Y, R) has a conditional 
density given M. The discrete case or the mixed case (where one has a conditional 
density and the other a conditional PMF) can be treated with the same approach. 
To prove this proposition we shall demonstrate that Y is a sufficient statistic for 
guessing H based on (Y,R) using the Factorization Theorem. To that end, we 
express the conditional density of (Y, R) as 


fy.riMam(¥,7) = fy|mem(Y) frymam(") 
= fy|m=m(Y) fr(r) 
= gm(y) hly,7), (22.46) 


where the first equality follows from the conditional independence of Y and R 
given M; the second from the hypothesis that the conditional density of R given 
M = m does not depend on m and by denoting this density by fr(-); and the 
final equality follows by defining gm(y) = fyjm=m(y) and h(y,r) = fr(r). The 
factorization (22.46) demonstrates that Y forms a sufficient statistic for guessing M 
based on (Y, R), i.e., that R is irrelevant for guessing M given Y. 


22.6 Testing with Random Parameters 


The notions of sufficient statistics and irrelevance also apply when testing in the 
presence of a random parameter. If the random parameter O is not observed, 
then T(Y) is sufficient if, and only if, for any prior {7,,} on M 


M-—T(Y)——Y. (22.47) 
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If © is of density fo(-) and independent of M, then, as in (20.101), we can express 
the conditional density of Y given M =m as 


fy|m=mn(y) = [ tre-oam(¥) fo(@) 49, 


so T(-) forms a sufficient statistic if, and only if, it forms a sufficient statistic for 
the M densities 


{yr f Feiecoaram(9) fo(9) ao} 


meM 
Similarly, R is irrelevant for guessing M given Y if, and only if, 
M-»-—-Y-~—R 


forms a Markov chain for every prior {7} on M. 


If the parameter O is observed, then T(Y,9) is a sufficient statistic if, and only if, 
for any prior {7,} on M 


M-.—T(Y,0)—--(Y, 0). 
If © is independent of M and of density fo(-), then the density fy 6)m~=m(-) can 


be expressed, as in (20.104), as 


fy,e|iM=m(Y; 9) = fo() fyje=o,m=m(Y), 


so T(-) forms a sufficient statistic if, and only if, it forms a sufficient statistic for 
the M densities 
{(6,y) aE fo(9) fyjo-o.m—m(y)$ 


Similarly, R is irrelevant for guessing M given (Y,90) if, and only if, 


meM- 


M-o—~(Y,@)—o—R. 


The following lemma provides an easily-verifiable condition that guarantees that R 
is irrelevant for guessing H based on Y, irrespective of whether the random pa- 
rameter is observed or not. 


Lemma 22.6.1. If for any prior {mm} on M we have that R is independent of the 
triplet (M,0,Y),° then R is irrelevant for guessing M given (0, Y) and also for 
guessing M given Y. 


Proof. To prove the lemma when 0 is observed, we need to show that the inde- 
pendence of R and the triplet (V,0,Y) implies 


M-»—(Y,90)—o—R, 


8Note that being independent of the triplet is a stronger condition than being independent of 
each of the members of the triplet! 
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ie., that the conditional distribution of R given (Y,0) is the same as given 
(M,Y,©). This is indeed the case because R is independent of (M,Y,9) so 
the two conditional distributions are equal to the unconditional distribution of R. 


To prove the lemma in the case where © is unobserved, we need to show that the 
independence of R and the triplet (7,0, Y) implies that 


M-.—-Y—-o—R. 


Again, one can do so by noting that the conditional distribution of R given Y is 
equal to the conditional distribution of R given (Y,M) because both are equal to 
the unconditional distribution of R. 


22.7 Additional Reading 


The classical definition of sufficient statistic as a mapping T(-) such that the dis- 
tribution of Y given (T(Y), M =m) does not depend on m is due to R. A. Fisher. 
A. N. Kolmogorov defined T(-) to be sufficient if for every prior {7} the a pos- 
terior distribution of M given Y can be computed from T(Y). In our setting 
where M takes on a finite number of values the two definitions are equivalent. For 
an example where the definitions differ, see (Blackwell and Ramamoorthi, 1982). 


For a discussion of pairwise sufficiency and its relation to sufficiency, see (Halmos 
and Savage, 1949). 


22.8 Exercises 


Exercise 22.1 (Another Proof of Proposition 22.4.6). Give an alternative proof of Propo- 
sition 22.4.6 using Theorem 22.3.5. 


Exercise 22.2 (Hypothesis Testing with Two Observations). Let H take on the values 0 
and 1 equiprobably. Let Y1 be a random vector taking value in R?, and let Y2 be a 
random variable. Conditional on H = 0, 


Yi, =pt+Zi, Yo =a+t Zo, 


and, conditional on H = 1, 


Yi, =-p+Z, Yop = -—a+ Zo. 


Here H, Zi, and Z2 are independent with the components of Zi being IID N(0,1), 
with Z being a mean-one exponential, and with ys € R? and a € R being deterministic. 


(i) Find an optimal rule for guessing H based on Y,. Find a one-dimensional sufficient 
statistic. 
(ii) Find an optimal rule for guessing H based on Y2. 
(iii) Find a two-dimensional sufficient statistic (71, T2) for guessing H based on (Y1, Y2). 
(iv) Find an optimal rule for guessing H based on the pair (71, 7»). 
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Exercise 22.3 (Sufficient Statistics and the Bhattacharyya Bound). Show that if the 
mapping T : R¢ — R® is a sufficient statistic for the densities fyja=0(-) & fyjz=i(), 
and if T = T(Y) is of conditional densities fyjy=0(-) and frjq=1(-), then 


5 LV Pem=o) fy|n=i(y) dy = she \/Frin=o(t) farjn—i(t) at. 


=1] 


(i) Show that if the hypotheses of Proposition 22.5.5 are satisfied, then the random 
variables Y and R must be independent also when one does not condition on M. 


Hint: You may want to first derive the identity 


[ Vfewol) fyjnaily) dy = e| (fee) 


Exercise 22.4 (Sufficient Statistics and Irrelevant Data). 


(ii) Show that the conditions for irrelevance in that proposition are not necessary. 


Exercise 22.5 (Two More Characterizations of Sufficient Statistics). Let Py|;=0(-) and 
Py|H=1(-) be probability mass functions on the finite set Y. We say that T(Y) forms a 
sufficient statistic for guessing H based on Y if H—o—T(Y)——Y for every prior on H. 
Show that each of the following conditions is equivalent to T(Y) forming a sufficient 
statistic for guessing H based on Y: 


(a) For every y € Y satisfying Py)q=0(y) + Py|=1(y) > 0 we have 


Py\H=0(y) a Pr n=0(T(y)) 
Pyjn=i(y) — Prjw=i(T(y))’ 


where we adopt the convention (20.39). 


(b) For every prior (70,71) on H there exists a decision rule that bases its decision on 
mo, 71, and T(Y) and that is optimal for guessing H based on Y. 


Exercise 22.6 (Pairwise Sufficiency Implies Sufficiency). Prove Proposition 22.3.2 in the 
case where the conditional densities of the observable given each of the hypotheses are 
positive. 


Exercise 22.7 (Simulating the Observable). In all the examples we gave in Section 22.3.4 
the random vector Y was generated from T(yobs) uniformly over the set of vectors € 
in R¢ satisfying T(€) = T (yous). Provide an example where this is not the case. 


Hint: The setup of Proposition 22.5.5 might be useful. 


Exercise 22.8 (Densities with Zeros). Conditional on H = 0, the d components of Y are 
IID and uniformly distributed over the interval [ao, Go]. Conditional on H = 1, they are 
IID and uniformly distributed over the interval [a1, 31]. Show that the tuple 


(max{y, ery yr, minfy", See yay) 


forms a sufficient statistic for guessing H based on Y. 
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Exercise 22.9 (Optimality Does Not Imply Sufficiency). Let H take value in the set 
{0,1}, and let d = 2. Suppose that 


Yj =(1-2H)+0Z;, j=1,...,d, 


where H,0O, Z1,..., Za are independent with © taking on the distinct positive values oo 
and o1 with probability po and pi respectively, and with Z,...,Za being IID N(0,1). 
Let T= 97, ¥}- 


(i) Show that T forms a sufficient statistic for guessing H based on Yi,..., Ya when © 
is observed. 


(ii) Show that T does not form a sufficient statistic for guessing H based on Yj,..., Ya 
when 9 is not observed. 


(iii) Show that notwithstanding Part (ii), if H has a uniform prior, then the decision 
rule that guesses “H = 0” whenever T > 0 is optimal both when © is observed and 
when it is not observed. 


Exercise 22.10 (Markovity Implies Markovity). Suppose that for every prior on M 
(M, A)—o—T(Y)—0—Y 


forms a Markov chain, where M takes value in the set M = {1,..., M}, where A and Y 
are random vectors, and where T(-) is Borel measurable. Does this imply that 7’(-) forms 
a sufficient statistic for guessing M based on Y? 


Chapter 23 


The Multivariate Gaussian Distribution 


23.1 = Introduction 


The multivariate Gaussian distribution is arguably the most important multi- 
variate distribution in Digital Communications. It is the extension of the univariate 
Gaussian distribution from scalars to vectors. A random vector of this distribu- 
tion is said to be a Gaussian vector, and its components are said to be jointly 
Gaussian. In this chapter we shall define this distribution, provide some useful 
characterizations, and study some of its key properties. To emphasize its con- 
nection to the univariate distribution, we shall derive it along the same lines we 
followed in deriving the univariate Gaussian distribution in Chapter 19. 


There are a number of equivalent ways to define the multivariate Gaussian distri- 
bution, and authors typically pick one definition and then proceed over the course 
of numerous pages to derive alternate characterizations. We shall also proceed in 
this way, but to satisfy the impatient reader’s curiosity we shall state the various 
equivalent definitions in this section. The proof of their equivalence will be spread 
over the whole chapter. 


In the following definition we use the notation introduced in Section 17.2. In 
particular, all vectors are column vectors, and we denote the components of the 
vector a € R” by a... a. 


Definition 23.1.1 (Standard Gaussians, Centered Gaussians, and Gaussians). 


(i) A random vector W taking value in R” is said to be a standard Gaussian 
if its n components W,...,W™ are independent and each is a zero-mean 
unit-variance univariate Gaussian. 


(ti) A random vector X taking value in R” is said to be a centered Gaussian 


if there exists some deterministic n x m matriz A such that the distribution 
of X is the same as the distribution of AW, 1.e., 


X =Aw, (23.1) 
where W is a standard Gaussian with m components. 
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(itt) A random vector X taking value in R” is said to be Gaussian if there exists 
some deterministic n x m matrix A and some deterministic vector © R” 
such that the distribution of X is equal to the distribution of AW + pL, 7.e., if 


X=AW + uy, (23.2) 


where W is a standard Gaussian with m components. 


The random vectors AW + yz and X can have identical laws only if they have 
identical mean vectors. As we shall see, the linearity of expectation and the fact 
that a standard Gaussian is of zero mean imply that the mean vector of AW + pu 
is equal to w. Thus, AW + pw and X can have identical laws only if uw = E[X]. 
Consequently, X is a Gaussian random vector if, and only if, for some A and W 
as above X = AW + E[X]. Stated differently, X is a Gaussian random vector if, 
and only if, X — E[X] is a centered Gaussian. 


While Definition 23.1.1 allows for the matrix A to be rectangular, we shall see in 
Corollary 23.6.13 that every centered Gaussian can be generated from a standard 
Gaussian by multiplication by a square matrix. That is, if X is an n-dimensional 
centered Gaussian, then there exists an n x n square matrix A such that X = AW, 
where W is a standard Gaussian. 


In fact, we shall see in Theorem 23.6.14 that we can even limit ourselves to square 
matrices that are the product of an orthogonal matrix by a diagonal matrix. Since 
multiplying W by a diagonal matrix merely scales its components while leaving 
them independent and Gaussian, it follows that X is a centered Gaussian if, and 
only if, its law is the same as the law of the result of applying an orthogonal 
transformation to a random vector whose components are independent zero-mean 
univariate Gaussians (not necessarily of equal variance). 


In view of Definition 23.1.1, it is not surprising that applying a linear transfor- 
mation to a Gaussian vector results in a Gaussian vector ((23.43) ahead). The 
reverse is perhaps more surprising: X is a Gaussian vector if, and only if, the re- 
sult of applying any deterministic linear functional to X has a univariate Gaussian 
distribution (Theorem 23.6.17 ahead). 


We conclude this section with the following pact with the reader. 


(i) Unless preceded by the word “random” or “Gaussian,” all scalars, vectors, 
and matrices in this chapter are deterministic. 


(ii) Unless preceded by the word “complex,” all scalars, vectors, and matrices in 
this chapter are real. 


But, without violating this pact, we shall sometimes get excited and throw in the 
words “real” and “deterministic” even when unnecessary. 


23.2 Notation and Preliminaries 


Our notation in this chapter expands upon the one introduced in Section 17.2. To 
minimize page flipping, we repeat here parts of that section. 
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Deterministic vectors are denoted by boldface lowercase letters such as w, whereas 
random vectors are denoted by boldface uppercase letters such as W. When we 
deal with deterministic matrices we make an exception to our rule of trying to 
denote deterministic quantities by lowercase letters.! Thus, deterministic matrices 
are denoted by uppercase letters. But to make it clear that we are dealing with 
a deterministic matrix and not a scalar random variable, we use special fonts to 
distinguish the two. Thus A denotes a deterministic matrix, whereas A denotes a 
random variable. Random matrices, which only appear briefly in this book, are 
denoted by uppercase letters of yet another font, e.g., H. 


An n x m deterministic real matrix A is an array of real numbers having n rows 
and m columns 


aA ghey. co Be) 

aed. Gey. sng. salem) 
A= 

alt) gird) glum) 


The Row-j Column-é element of the matrix A is denoted 
aF9 or [A]je- 


The transpose of an n x m matrix A is the m x n matrix A' whose Row-j Column-@ 
entry is equal to the Row-@ Column-j entry of A: 


[A] 5.2 =[Alzy, Fe{laugm}, £¢ {hsp}. 


We shall repeatedly use the fact that if the matrix-product AB is defined (i.e., if 
the number of columns of A is the same as the number of rows of B), then the 
transpose of the product is the product of the transposes in reverse order 


(AB)' =BTAT. (23.3) 


The n x n identity matrix whose diagonal elements are all 1 and whose off- 
diagonal elements are all 0 is denoted I|,,. The all-zero matrix whose components 
are all zero is denoted 0. 


An n X 1 matrix is an n-vector, or a vector for short. Thus, unless otherwise 
specified, all the vectors we shall encounter are column vectors.? The components 


of an n-vector a are denoted by a,... a”) so 
qa) 

a= : ; 
AG) 


or, in a typographically more efficient form, 


a=(a,...,a), 


lWe have already made some exceptions to this rule when we dealt with deterministic con- 
stants that are by convention always denoted using uppercase letters, e.g., bandwidth W, ampli- 
tude A, baud period Ts, etc. 

? An exception to this rule is in our treatment of linear codes where the tradition of using row 
vectors is too strong to change. 
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The vector whose components are all zero is denoted by 0. The square root of the 
sum of the squares of the components of a real n-vector a is denoted by |lal|: 


n 


lal =,] 2), aeR"”. (23.4) 
l=1 


..,0(%)T, then? 


alb= SS al 9 
t=1 


l| 

io” 
4 

~ 


In particular, 


2 “ 2 
lial” = So(e) 
t=1 
=ala. (23.5) 
Note the difference between a'a and aa!: the former is the scalar |la||? whereas 
the latter is the n x n matrix whose Row-j7 Column- element is aa, 


The determinant of a square matrix A is denoted by det A. We note that a matrix 
and its transpose have equal determinants 


det (A') = det A, (23.6) 


and that the determinant of the product of two square matrices is the product of 
the determinants 


det (AB) = det (A) det (B). (23.7) 


We say that a square n x n matrix A is singular if its determinant is zero or, 
equivalently, if its columns are linearly dependent or, equivalently, if its rows are 
linearly dependent or, equivalently, if there exists some nonzero vector a € R” 
such that Aw = 0. 


23.3. Some Results on Matrices 


We next survey some of the results from Matrix Theory that we shall be using. Par- 
ticularly important to us are results on positive semidefinite matrices, because, as 
we shall see in Proposition 23.6.1, every covariance matrix is positive semidefinite, 
and every positive semidefinite matrix is the covariance matrix of some random 
vector. 


3In (20.84) we denoted a'b by (a, b) py. 


458 The Multivariate Gaussian Distribution 


23.3.1 Orthogonal Matrices 


Definition 23.3.1 (Orthogonal Matrices). An n x n real matrix U is said to be 
orthogonal if 
WUT: Sh. (23.8) 


As proved in (Axler, 1997, Chapter 7, Theorem 7.36), the condition (23.8) is equiv- 
alent to the condition 
UTU=In. (23.9) 


Thus, a real matrix is orthogonal if, and only if, its transpose is orthogonal. From 
(23.8) and (23.9) we also obtain: 


Note 23.3.2. The inverse of an orthogonal matrix is its transpose. 


If we write an n X nm matrix U in terms of its columns as 


f° cette. 2 
Te rae 
i. eee 
then (23.9) can be expressed as 
= UT 
-w o\flo out 
= ie see allie 


wir wi we ey Win 
W291 Wz. WIYn 
Prd. Vr. Win 
thus showing that a real n x n matrix U is orthogonal if, and only if, its n columns 
W1,.--, Wy satisfy 
wb =Uv=v}, vv €f1,...,n}. (23.10) 


Using the same argument but starting with (23.8) we can prove a similar result 
about the rows of an orthogonal matrix: if the rows of a real n x n matrix U are 
denoted by #],...,@), ie., 

OTH eee. i 
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then U is orthogonal if, and only if, 
boy =Yvu=uv}, vv’ € {1,...,n}. (23.11) 


Recalling that the determinant of a product of square matrices is the product of 
the determinants and that the determinant of a matrix is equal to the determinant 
of its transpose, we obtain that for every square matrix U 


det (UU") = (det U)”. (23.12) 


Consequently, by taking the determinant of both sides of (23.8) we obtain that the 
determinant of an orthogonal matrix must be either +1 or —1. It should, however, 
be noted that there are numerous examples of matrices of unit determinant that 
are not orthogonal. 


We leave it to the reader to verify that a 2 x 2 matrix is orthogonal if, and only if, 
it is equal to one of the following matrices for some choice of -7m <0 <7 


Ge oo) Ga sin 0 ) (23.13) 


sin@ cos sin@ —cos@ 


The former matrix corresponds to a rotation by # and has determinant +1, and 
the latter to a reflection followed by a rotation 


cos@ sin@ \  /cos@ —siné@ 1 O 
sin@ —cos@})  \sin@ ~ cos@ 0 -1 
and has determinant —1. 


23.3.2 Symmetric Matrices 
A matrix A is said to be symmetric if it is equal to its transpose: 
AT=A. 


Only square matrices can be symmetric. A vector ~ € R” is said to be an eigen- 
vector of the matrix A corresponding to the real eigenvalue  € R if q& is nonzero 
and if Ay = Aw. The following is a key result about the eigenvectors of symmetric 
real matrices. 


Proposition 23.3.3 (Eigenvectors and Eigenvalues of Symmetric Real Matrices). 
If A is a symmetric real n x n matriz, then A has n (not necessarily distinct) 
real eigenvalues r1,...,An € R with corresponding eigenvectors W1,...,Yn € R” 
satisfying 

wid =Uv=v}, vv €f1,...,n}. (23.14) 


Proof. See, for example, (Axler, 1997, Chapter 7, Theorem 7.13, p. 136), or (Her- 
stein, 2001, Section 6.10, pp. 346-348), or (Horn and Johnson, 1985, Chapter 4, 
Section 1, Theorem 4.5.1). 
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The vectors %1,..., Wp, are eigenvectors of the matrix A corresponding to the eigen- 
values A1,..-,An if 
Awd, =A, v €{l,...,n}. (23.15) 
We next express this in an alternative way. We begin by noting that 
| t, . ake t 
Al wy: tn | =] Ad. «+ Ad, 
ete, <ul | eee | 
and that 
1 t\ (a 0 0 1 1 
0 A 
yy Wn : = MY AnWn 
ee 
u 1 0 0 An 1 L 
Consequently, Condition (23.15) can be written as 
AU = UA, (23.16) 
where 
wie Sf A OO -: 60 
U=] ww au, | and A= OF 228 (23.17) 
ae : . . 0 
a ee 0 «+ O An 


Condition (23.14) is equivalent to the condition that the above matrix U is orthog- 
onal. By multiplying (23.16) from the right by the inverse of U (which, because U 
is orthogonal and by (23.8), is U') we obtain the equivalent form A = UAUT. 
Consequently, an equivalent statement of Proposition 23.3.3 is: 


Proposition 23.3.4 (Spectral Theorem for Real Symmetric Matrices). A sym- 
metric realn x n matriz A can be written in the form 


A=UAU! 


where, as in (23.17), A is a diagonal real n x n matrix whose diagonal elements 
are the eigenvalues of A, and where U is a real n x n orthogonal matriz whose v-th 
column is an eigenvector of A corresponding to the eigenvalue in the v-th position 
on the diagonal of A. 


The reverse is also true: if A = UAU' for a real diagonal matrix A and for a real 
orthogonal matrix U, then A is symmetric, its eigenvalues are the diagonal elements 
of A, and the v-th column of U is an eigenvector of the matrix A corresponding to 
the eigenvalue in the v-th position on the diagonal of A. 
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23.3.3 Positive Semidefinite Matrices 


Definition 23.3.5 (Positive Semidefinite and Positive Definite Matrices). 


(i) We say that the n x n real matriz K is positive semidefinite or nonneg- 


ative definite and write 
K=0 


if K is symmetric and 
alKa>0, a@e€R”. 


(ti) We say that the n x n real matrix K is positive definite and write 
K>0 
if K is symmetric and 
a!lKa > 0, (0 ZO, aE R"). 
The following two propositions characterize positive semidefinite and positive def- 
inite matrices. For proofs, see (Axler, 1997, Chapter 7, Theorem 7.27). 


Proposition 23.3.6 (Characterizing Positive Semidefinite Matrices). Let K be a 
realn x n matrix. Then the statement that K is positive semidefinite is equivalent 
to each of the following statements: 


(a) The matriz K can be written in the form 
K=S'S (23.18) 
for some real n x n matrix S.4 
(b) The matrix K is symmetric and all its eigenvalues are nonnegative. 
(c) The matrix K can be written in the form 


K = UAUT, (23.19) 


where A is a real n x n diagonal matriz with nonnegative entries on the 
diagonal and where U is a real n x n orthogonal matriz. 


Proposition 23.3.7 (Characterizing Positive Definite Matrices). Let K be a real 
nxn matrix. Then the statement that K is positive definite is equivalent to each 
of the following statements. 


(a) The matrix K can be written in the form K = S'S for some real n x n 
nonsingular matrix S. 


(b) The matrix K is symmetric and all its eigenvalues are positive. 


4Even if S is not a square matrix, S'S > 0. 
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(c) The matriz K can be written in the form 
K = UAUT, 


where A is a realn x n diagonal matrix with positive entries on the diagonal 
and where U is a realn x n orthogonal matric. 


Given a positive semidefinite matrix K, how can we find a matrix S satisfying 
K = S'S? In general, there can be many such matrices. For example, if K is the 
identity matrix, then S can be any orthogonal matrix. We mention here two useful 
choices. Being symmetric, the matrix K can be written in the form 


K = UAU!, (23.20) 


where U and A are as in (23.17). Since K is positive semidefinite, the diagonal 
elements of A (which are the eigenvalues of K) are nonnegative. Consequently, we 
can define the matrix 


a OD “se 16 
A/2 — 0 VAz2 
an ae: | 
(> 2a Wa, 
One choice of the matrix S is 
S=A2UT, (23.21) 
Indeed, with this definition of S we have 
ste (AUT) At 
= UA1/2a1/2UT 
= UAUT 
— K, 
where the first equality follows from the definition of S; the second from the rule 
(AB)' = BTAT and from the symmetry of the diagonal matrix A‘/?; the third from 


the definition of A!/?; and where the final equality follows from (23.20). 


A different choice for S, which will be less useful to us in this chapter, is? 


UAY2US 
The following lemmas will be used in Section 23.4.3 when we study random vectors 
of singular covariance matrices. 


Lemma 23.3.8. Let K be a real n x n positive semidefinite matrix, and let a be a 
vector in R". Then a' Ka =0 if, and only if, Ka = 0. 


5This is the only choice for S that is positive semidefinite (Axler, 1997, Chapter 7, Proposi- 
tion 7.28), (Horn and Johnson, 1985, Chapter 7, Section 7.2, Theorem 7.2.6). 
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Proof. One direction is trivial and does not require that K be positive semidefinite: 
if Ka = 0, then a' Ka must also be equal to zero. Indeed, in this case we have by 
the associativity of matrix multiplication a'Ka = a'(Ka) = a'0=0. 


To prove the other direction, we first note that, since K is positive semidefinite, 
there exists some n x n matrix S such that K = S'S. Hence, 


a'lKa=a'S'Sa 
= (Sa)'(Sa) 
=|Sal?, aeR”", 


where the second equality follows from the rule for transposing a product (23.3). 
and where the third equality follows from (23.5). Consequently, if a’ Ka = 0, then 
\|Sa||” = 0, so Sa = 0, and hence S™Sa = 0, ie., Ka = 0. 


Lemma 23.3.9. If K is a real n x n positive definite matriz, then a'Ka = 0 if, 
and only if, a=0. 


Proof. Follows directly from Definition 23.3.5 of positive semidefinite matrices. 


23.4 Random Vectors 


23.4.1 Definitions 


Recall that an n-dimensional random vector or a random n-vector X de- 
fined over the probability space (0, F, P) is a (measurable) mapping from the set 
of experiment outcomes 2 to the n-dimensional Euclidean space R". A random 
vector X is very much like a random variable, except that rather than taking value 
in the real line R, it takes value in R”. In fact, an n-dimensional random vector 
can be viewed as an array of n random variables.® 


The density of a random vector is the joint density of its components. The density 
of a random n-vector is thus a nonnegative (Borel measurable) function from R” 
to the nonnegative reals that integrates to one. 


Similarly, an n x m random matrix H is an n x m array of random variables defined 
over a common probability space. 


23.4.2 Expectations and Covariance Matrices 


The expectation E[X] of a random n-vector X = (X,...,X™)™ is a vector 
whose components are the expectations of the corresponding components of X:" 
T 


E(x] 4 (E(x), ...,E[x]) (23.22) 


6In dealing with random vectors one often abandons the “coordinate free” approach and views 
vectors in a particular coordinate system. This allows one to speak of the covariance matrix in 
more familiar terms. 

7The expectation of a random vector is only defined if the expectation of each of its compo- 
nents is defined. 
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The j-th element of E[X] is thus the expectation of the j-th component of X, 
namely, E[Xx (3 )). Similarly, the expectation of a random matrix is the matrix of 
expectations. 


If all the components of a random n-vector X are of finite variance, then we say 
that X is of finite variance. We then define its n x n covariance matrix Kxx 
as 


Kxx © [(X — E[X]) (X - E[X))"]. (23.23) 
That is, 
XQ) E[x®] 
Keer (XO -E[X@] oe. Xx —ELxm]) 
xX) —E[xO] 
Var [X(] Cov[X, X@)] --.  Cov[xXM, x] 
Cov[xX@), xO] Var [.X ()] = Cov[X@), x] 
= . | . (23.24) 
Cov[X™, XO] Cov[X™,X@] o.- — Var[XO] 


If n = 1 and the n-dimensional random vector X hence a scalar, then the covariance 
matrix Kxx is a 1 x 1 matrix whose sole component is the variance of the sole 
component of X. 


Note that from the n x n covariance matrix Kxx of a random n-vector X it is easy 
to compute the covariance matrix of a subset of X’s components. For example, if 
we are only interested in the 2 x 2 covariance matrix of (X“), X@))', then we just 
pick the first two columns and the first two rows of Kxx. More generally, the r x r 


covariance matrix of (X%), XG2),..., XGr))T for 1 < jy < jo < +++ < jp < nis 
obtained from Kxx by picking Rows and Columns j;,...,7,. For example, if 
30 31 9 7 
ee 31 39 11 138 
9 11 9 12]’ 
7 13 12 = 26 


then the covariance matrix of (X@), xo)" caer 


We next explore the behavior of the mean vector and the covariance matrix of 
a random vector when it is multiplied by a deterministic matrix. Regarding the 
mean, we shall show that since matrix multiplication is a linear transformation, it 
commutes with the expectation operation. Consequently, if H is a random n x m 
matrix and A is a deterministic vy x n matrix, then 


E[AH] = AE[H] , (23.25a) 
and similarly if B is a deterministic m x v matrix, then 


E[HB] = E[H] B. (23.25b) 


23.4 Random Vectors 465 


To prove (23.25a) we write out the Row-j Column-¢ element of the v x m ma- 
trix E[AH] and use the linearity of expectation to relate it to the Row-j7 Column-¢ 
element of the matrix AE[H]: 


(E(AH]].,, = e[ SAL Hn 
= S-ElIAls. H,.e| 
= AlsxE [Ee 


= AEH] je {l,...,v}, 2e {1,...,m}. 


The proof of (23.25b) is almost identical and is omitted. 


The transpose operation also commutes with expectation: if H is a random matrix 


then 
E[H"] = (E[H))'. (23.26) 


As to the covariance matrix, we next show that if A is a deterministic matrix and 
if X is a random vector, then the covariance matrix Kyy of the random vector 
Y = AX can be expressed in terms of the covariance matrix Kxx of X as 


Kyy =AKxxA', Y=AX. (23.27) 

Indeed, 
Kyy = E[(Y — E[Y])(¥ -E[¥])"] 

= E[(AX — E[AX])(AX — E[AX])"] 

= E[A(X — E[X])(A(K — E[X]))"] 

= E[A(X — E[X])(X — E[X])"A™] 

= AE[(X — E[X])(X — E[X])TA™] 

= AE[(X — E[X])(X — E[X])"JAT 

= AKxx AT 


A key property of covariance matrices is that, as we shall next show, they are all 
positive semidefinite. That is, the covariance matrix Kxx of any random vector X 
is a symmetric matrix satisfying 


al Kxxa>0, a€R”. (23.28) 


(In Proposition 23.6.1 we shall see that this property fully characterizes covari- 
ance matrices: every positive semidefinite matrix is the covariance matrix of some 
random vector.) 


To prove (23.28) it suffices to consider the case where X is of zero mean because 
the covariance matrix of X is the same as the covariance matrix of X — E[X]. The 
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symmetry of Kxx follows from the definition of the covariance matrix (23.23); from 
the fact that expectation and transposition commute (23.26); and from the formula 
for the transpose of a product of matrices (23.3): 


oo (E[xx"]) 
=—|(xx")"| 


= —[XX"] 
= Kxx. (23.29) 


The nonnegativity of a' Kxx @ for any deterministic @ € R” follows by noting 
that by (23.27) (applied with A = a") the term a’ Kxx @ is the variance of the 
scalar random variable a'X, i.e., 


a! Kxx a = Varla'X] (23.30) 


and, as such, is nonnegative. 


23.4.3. Singular Covariance Matrices 


A random vector having a singular covariance matrix can be unwieldy because it 
cannot have a density function. Indeed, as we shall see in Corollary 23.4.2, any such 
random vector has at least one component that is determined (with probability one) 
by the other components. In this section we shall propose a way of manipulating 
such vectors. Roughly speaking, the idea is that if X has a singular covariance 
matrix, then we choose a subset of its components so that the covariance matrix of 
the chosen subset be nonsingular and so that each component that was not chosen 
be equal (with probability one) to a deterministic affine function of the chosen 
components. We then manipulate only the chosen components and, with some 
deterministic bookkeeping “on the side,” take care of the components that were 
not chosen. This idea is made precise in Corollary 23.4.3. 


To illustrate the idea, suppose that X is a zero-mean random vector of covariance 
matrix 


3.5 7 
Kxx = [5 9 13 
7 13 19 


An application of Proposition 23.4.1 ahead will show that because the three columns 
of Kxx satisfy the linear relationship 


3 5 7 
—{5]+2[9]-[13] =o, 
7 13 19 


it follows that 


xX 42x?) — x) =0, with probability one. 
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Consequently, in manipulating X we can pick the two components X@), X@), 
which are of nonsingular covariance matrix (, {3) (obtained by picking the last 
two rows and the last two columns of Kxx), and keep track “on the side” of the 
fact that X) is equal, with probability one, to 2X) — X@). We could, of course, 
also pick the components X, X() of nonsingular covariance matrix (33) and 


keep track “on the side” of the relationship X@) =2x@) — x), 


To avoid cumbersome language, for the remainder of this section we shall take all 
equalities between random variables to stand for equalities with probability one. 
Thus, if we write X“) = 2X) — X(3) we mean that the probability that X“) is 
equal to 2X) — X() is one. 


The justification of the procedure is in the following proposition and its two corol- 
laries. 


Proposition 23.4.1. Let X be a zero-mean random n-vector of covariance ma- 
trix Kxx. Then its ¢-th component X is a deterministic linear combination of 
X(),..., Xn) if, and only if, the €-th column of Kxx is a linear combination of 
Columns €),...,€). Here €,n, 1,...,€,€ {1,...,n} are arbitrary. 


Proof. If ¢ € {&,...,,}, then the result is trivial. We shall therefore present a 
proof only for the case where ¢ ¢ {f1,...,@,}. In this case, the &th component of 
the random n-vector X is a linear combination of the 7 components X(),..., Xn) 
if, and only if, there exists a vector a € R” satisfying 


a = -1, (23.31a) 
a) —0, Ke {b,4,...,Ln}, (23.31b) 

and 
alX=0. (23.31c) 


Since X is of zero mean, the condition a'X = 0 is equivalent to the condition 
Var [aT X] = 0. By (23.30) and Lemma 23.3.8 this latter condition is equivalent 
to the condition Kxx a = 0. Now Kxx a is a linear combination of the columns 
of Kxx where the first column is multiplied by a“, the second by a), ete. Con- 
sequently, the condition that Kxx a = 0 for some a € R” satisfying (23.3la) & 
(23.31b) is equivalent to the condition that the éth column of Kxx is a linear 
combination of Columns ¢1,..., £y. 


Corollary 23.4.2. The covariance matrix of a zero-mean random n-vector X is 
singular if, and only if, some component of X is a linear combination of the other 
components. 


Proof. Follows from Proposition 23.4.1 by noting that a square matrix is singular 
if, and only if, its columns are linearly dependent. 


Corollary 23.4.3. Let X be a zero-mean random n-vector of covariance matrix Kxx. 
If Columns €,,...,£qa of Kxx form a basis for the subspace of R” spanned by the 
columns of Kxx, then every component of X can be written as a linear combination 
of the components X),..., Xa), and the random d-vector (x), oe x ea))" 
has a nonsingular d x d covariance matriz. 
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Proof. Since Columns ¢),...,£q form a basis for the subspace spanned by the 
columns of Kxx, every column ¢ can be written as a linear combination of these 
columns. Consequently, by Proposition 23.4.1, every component of X can be writ- 
ten as a linear combination of X“),..., X“«), To prove that the dx d covariance 
matrix Kgg of the random d-vector X= (xe), bas x(a)" is nonsingular, we 
note that if this were not the case, then by Corollary 23.4.2 applied to X it would 
follow that one of the components of X is a linear combination of the other d— 1 
components. But by Proposition 23.4.1 applied to X, this would imply that the 
columns ¢;,...,€q of Kxx are not linearly independent, in contradiction to the 
corollary’s hypothesis that they form a basis. 


23.4.4 The Characteristic Function 


If X is a random n-vector, then its characteristic function ®x(-) is a mapping 
from R” to C that maps each vector @ = (w,...@')T in R” to x(a), where 


&x(w) 2E lens) 


If X has the density fx(-), then 
®x(w) =| af fx(x) bE PO gg... dg, 


which is reminiscent of the multi-dimensional Fourier Transform of fx(-) (ignoring 
2n’s and the sign of i). 


Proposition 23.4.4 (Identical Distributions and Characteristic Functions). Two 
random n-vectors X,Y are of the same distribution if, and only if, they have 
identical characteristic functions: 


(x ZY) © (@x(w~)=oy(@), weR"). (23.32) 


Proof. See (Dudley, 2003, Chapter 9, Section 5, Theorem 9.5.1). 


This proposition is extremely useful. We shall demonstrate its power by using it 
to show that two random variables X and Y are independent if, and only if, 


E [em X+e¥)] = Efex] Efe], a,c, ER. (23.33) 
One direction is straightforward. If X and Y are independent, then for any Borel 


measurable functions g(-) and h(-) the random variables g(X) and h(Y) are also 
independent. Thus, the independence of X and Y implies the independence of the 


23.5 A Standard Gaussian Vector 469 


random variables e'7!* and e'¥2Y and hence implies that the expectation of their 
product is the product of their expectations: 

Eee) = E [em cima] 
= Eje= | Ele2*'| , W,02ER. 


As to the other direction, suppose that X’ has the same law as X, that Y’ has the 
same law as Y, and that X’ and Y’ are independent. Since X’ has the same law 
as X, it follows that 


Em *'] = E[e'™*], mm eR, (23.34) 
and similarly for Y’ 
E[em¥'] =E[e™Y], a eR. (23.35) 
Consequently, since X’ and Y’ are independent 
Eee = Elem aoe 
SE | Ele | 
= Ele) Ele™2* |: aij R, 
where the third equality follows from (23.34) and (23.35). 


We thus see that if (23.33) holds, then the characteristic function of the vector 
(X,Y)! is identical to the characteristic function of the vector (X’,Y’)'. By 
Proposition 23.4.4 the joint distribution of (X,Y) must then be the same as the 
joint distribution of (X’,Y’). Since according to the latter distribution the two 
components are independent, it follows that the same must be true according to 
the former, i.e., X and Y must be independent. 


23.5 A Standard Gaussian Vector 


Recall Definition 23.1.1 that a random n-vector W is a standard Gaussian if its n 
components are independent zero-mean unit-variance Gaussian random variables. 
Its density fw(-) is then given by 


= (27) e-2llWl? we R™. (23.36) 


The definition of a standard Gaussian random vector is an extension of the defi- 
nition of a standard Gaussian random variable: the sole component of a standard 
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one-dimensional Gaussian vector is a scalar N(0,1) random variable. Conversely, 
every (0,1) random variable can be viewed as a one-dimensional standard Gaus- 
sian. 


If W is a standard Gaussian random n-vector then, as we next show, its mean 
vector and covariance matrix are given by 


E[W]=0, and Kww =ln. (23.37) 


Indeed, the mean of a random vector is the vector of the means (23.22), so the 
fact that E[W] = 0 is a consequence of all the components of W having zero 
mean. And using (23.24) it can be easily shown that the covariance matrix of W 
is the identity matrix because the components of W are independent and hence, a 
fortiori uncorrelated, and because they are each of unit variance. 


23.6 Gaussian Random Vectors 


Recall Definition 23.1.1 that a random n-vector X is said to be Gaussian if for some 
positive integer m there exists an n X m matrix A; a standard Gaussian random 
m-vector W; and a deterministic vector uw € R” such that 


X=AW + uy. (23.38) 


From (23.38), from the second order properties of standard Gaussians (23.37), 
and from the behavior of the mean vector and covariance matrix under linear 
transformation (23.25a) & (23.27) we obtain 


(x 2 AW +pand W standard) => (E(x) = pand Kxx = AAT), (23.39) 


Recall also that X is a centered Gaussian if X 4 AW for A and W as above. 


Every standard Gaussian vector is a centered Gaussian because every standard 
Gaussian n-vector W is equal to AW when A is the n x n identity matrix I). 
The reverse is not true: not every centered Gaussian is a standard Gaussian. 
Indeed, standard Gaussians have the identity covariance matrix (23.37), whereas 
the centered Gaussian vector AW has, by (23.39), the covariance matrix AAT, 
which need not be the identity matrix. 


Also, X is a Gaussian vector if, and only if, X — E[X] is a centered Gaussian 
because, by (23.39), 


(x = AW + p for some p € R” and W standard Gaussian) 
S (x 2 AW +E[X] and W standard Gaussian ) 


= (x — E[X] = AW and W standard Gaussian ). (23.40) 


From (23.40) it also follows that the centered Gaussians are the Gaussian vectors 
of zero mean.® 


®Thus, the name “centered Gaussian,” which we gave in Definition 23.1.1 was not misleading. 
A vector is a “centered Gaussian” if, and only if, it is Gaussian and centered. 
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Using the definition of a centered Gaussian and using (23.39) we can readily show 
that every positive semidefinite matrix is the covariance matrix of some centered 
Gaussian. In fact, more is true: 


Proposition 23.6.1 (Covariance Matrices and Positive Semidefinite Matrices). 
The covariance matrix of every finite-variance random vector is positive semidefi- 
nite, and every positive semidefinite matrix is the covariance matrix of some cen- 
tered Gaussian random vector. 


Proof. The covariance matrix of every random vector is positive semidefinite be- 
cause every covariance matrix is symmetric (23.29) and satisfies (23.28). We next 
establish the reverse. Given an n x n positive semidefinite matrix K we shall con- 
struct a centered Gaussian X whose covariance matrix Kxx is equal to K. We begin 
by noting that, since K is positive semidefinite, it follows from Proposition 23.3.6 
that there exists some n x n matrix S such that S'S = K. Let W be a standard 
Gaussian n-vector and consider the vector X = S'W. Being the result of a linear 
transformation of the standard Gaussian W, this vector is a centered Gaussian. We 
complete the proof by showing that its covariance matrix Kxx is the prespecified 
matrix K. This follows from the calculation 


Kxx = S'S 
= K, 


where the first equality follows from (23.39) (by substituting S' for A and 0 for 2) 
and the second from our choice of S as satisfying S'S = K. 


23.6.1 Examples and Basic Properties 


In this section we provide some examples of Gaussian vectors and some simple 
properties that follow from their definition. 


(i) Every univariate N (pu, 07) random variable, when viewed as a one dimen- 
stonal random vector, is a Gaussian random vector. 


Proof: Such a univariate random variable has the same law as 
oW +p, when W is a standard univariate Gaussian. 


(ii) Any deterministic vector is a Gaussian vector. 
Proof: Choose the matrix A as the all-zero matrix 0. 


(iii) If the components of X are independent univariate Gaussians (not necessarily 
of equal variance), then X is a Gaussian vector. 


Proof: Choose A to be an appropriate diagonal matrix. 


For the purposes of stating the next proposition we remind the reader that the ran- 
dom vectors X = (XM, .., ,X(rs))7 and Y= (Y,... Y(rv))7 are independent 
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if, for every choice of &,...,&, € R and m,..., 1m, €R, 


Pr[x@) < Eigse < ap < unionist < "ny | 


= Pr[x®) S fignig A < Ene| Pr[y®) S My -+- YW) < Mny | + 


The following proposition is a consequence of the fact that if X,; & Xg are inde- 
pendent, X, = X/, X. = X$, and X’, & X% are independent, then 


Proposition 23.6.2 (Stacking Independent Gaussian Vectors). Stacking two in- 
dependent Gaussian vectors one on top of the other results in a Gaussian vector. 


Proof. Let the random n,-vector X, = (xX, we XO be Gaussian, and let the 
random n2-vector Ky = x, ve a be Gaussian and independent of X,. 
We need to show that the (n; + n2)-vector 


Bee Ree, Ca Caer iad (23.41) 


is Gaussian. 


Let the pair (Aj, #41) represent X, in the sense that X1 =A,Wi + [1, where Aj, 
is ny X m1, 1 € R™, and W, is a standard Gaussian mj -vector. Similarly, let 
the pair (Ag, 42) represent X2, where Ag is ng X mz and po € R”™. We next 
show that the vector (23.41) can be represented using the (n; + ng) x (m1 + m2) 
block-diagonal matrix A of diagonal components A; and Ag, and using the vector 
pe € R™*™ that results when the vector jz; is stacked on top of the vector p22: 


A= > p= (*2) (23.42) 


Indeed, if W is a standard Gaussian (n, + ng)-vector and if we denote by W, its 
first n; components and by W3 its last n2 components, then the random vectors 
W, and W? are independent, and each is a standard Gaussian. Consequently, 


_ Ai 0 Wi 1 
awru= (7 a) (wa) + (02) 

= Ai Wi + bY 

A2W2 + p2 


z(X 
=(Xi). 


where the first equality follows from the definition of A and pw in (23.42); the 
second equality by computing the matrix product in blocks; and where the equality 
in distribution follows because the fact that W, is a standard Gaussian implies 
that X4 = A, W + p1, the fact that W 2 is a standard Gaussian implies that 
Xo = AgW2 + 2, and the fact that Wy, and W3 are independent implies that 
A, W, + pt, and AgW2 + pe are independent. 
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Proposition 23.6.3 (An Affine Transformation of a Gaussian Is a Gaussian). 
Let X be a Gaussian n-vector. If C is av x n matriz and if d € R”, then the 
random v-vector CX +d is Gaussian. 


Proof. If X =AW + Lt, where A is a deterministic n x m matrix, W is a standard 
Gaussian m-vector, and yz € R”, then 


CX+d=C(AW+p)+d 
= (CA)W + (Cy +d), (23.43) 
which demonstrates that CX +d is Gaussian, because (CA) is a deterministic v x m 


matrix, W is a standard Gaussian m-vector, and Cu +d is a deterministic vector 
in R’. 


This proposition has some important consequences. The first is that if we permute 
the components of a Gaussian vector then the resulting vector is also Gaussian. 
This explains why we sometimes say of random variables that they are jointly Gaus- 
sian without specifying an order. Indeed, by the following corollary, the Gaussianity 
of (X,Y, Z)' is equivalent to the Gaussianity of (Y,X,Z)", etc. 


Corollary 23.6.4. Permuting the components of a Gaussian vector results in a 
Gaussian vector. 


Proof. Follows from Proposition 23.6.3 by choosing C to be the appropriate per- 
mutation matrix, i.e., the matrix that results from permuting the columns of the 
identity matrix. For example, 


xX) 00 1\ (xX® 
XM) = {1 0 of | x® 
Xx) 0 1 0/ \x® 


Corollary 23.6.5 (Subsets of Jointly Gaussians Are Jointly Gaussian). Construct- 
ing a random p-vector from a Gaussian n-vector by picking p of its components 
(allowing for repetition) yields a Gaussian vector. 


Proof. Let X be a Gaussian n-vector. For any choice of j1,...,jp € {1,...,m}, 
we can express the random p-vector (X),...,X%»))™ as CX, where C is a deter- 
ministic p x n matrix whose Row-v Column-¢ component is given by 
[Clu F le = ju}- 
For example 
(1) 
(Xn) _ ( 0 ;) an 
a 
xM 10 0/ \ye) 


The result thus follows from Proposition 23.6.3. 


Proposition 23.6.6. Each component of a Gaussian vector is a univariate Gaus- 
sian. 
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Proof. Let X be a Gaussian n-vector, and let 7 € {1,...,n} be arbitrary. We 
need to show that X is Gaussian. Since X is Gaussian, there exist an n x m 
matrix A, a vector w € R”, and a standard Gaussian W such that the vector X 
has the same law as the random vector AW + ys (Definition 23.1.1). In particular, 
the j-th component of X has the same law as the j-th component of AW + p, ie., 


XDZS ah) WwO 4p, 7 € {1,...,n}. 
t=) 


The sum on the RHS is a linear combination of the independent univariate Gaus- 
sians W,...,W™ and is thus, by Proposition 19.7.3, Gaussian. The result of 
adding pu) is still Gaussian. 


We caution the reader that while each component of a Gaussian vector has a 
univariate Gaussian distribution, there exist random vectors that are not Gaussian 
and that yet have Gaussian components. 


23.6.2 The Mean and Covariance Determine the Law of a Gaussian 


From (23.39) it follows that if X = AW + p, where W is a standard Gaussian, 
then ws must be equal to E[X]. Thus, the mean of X fully determines the vector p. 
The matrix A, however, is not determined by the covariance of X. Indeed, by 
(23.39), the covariance matrix Kxx of X is equal to AA', so Kxx only determines 
the product AA’. Since there are many different ways to express Kxx as the 
product of a matrix by its transpose, there are many choices of A (even of different 
dimensions) that result in AX + wy having the given covariance matrix. Prima 
facie, one might think that these different choices for A yield different Gaussian 
distributions. But this is not the case. In this section we shall show that, while the 
choice of A is not unique, all choices that result in AA' having the given covariance 
matrix Kxx give rise to the same distribution. 


We shall derive this result by computing the characteristic function ®x(-) of a 
random n-vector X whose law is equal to the law of AW + jz, where W, A, and p 
are as above and by then showing that ®x(-) depends on A only via AAT, ice., 
that ®x(c@) can be computed for every w@ € R” from w, AAT, and pw. Since, by 
(23.39), AA™ is equal to the covariance matrix Kxx of X, it will follow that the 
characteristic functions of all Gaussian vectors of a given mean vector and a given 
covariance matrix are identical. Since random vectors of identical characteristic 
functions must have identical distributions (Proposition 23.4.4), it will follow that 
all Gaussian vectors of a given mean vector and a given covariance matrix have 
identical distributions. 


We thus proceed to compute the characteristic function of a random n-vector X 
whose law is the law of AW + yt, where W is a standard Gaussian m-vector, A 
isn x m, and pw € R”. By (23.39) it follows that Kxx = AA’. To that end we 
need to compute Ele" *] for every w@ € R”. From Proposition 23.6.3 (with the 
substitution of the 1 x n matrix @! for C and of the scalar zero for d), it follows 


that @'X is a Gaussian vector with only one component. By Proposition 23.6.6, 
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this sole component is a univariate Gaussian. Its mean is, by (23.25a), @'w and 
its variance is, by (23.30), @' Kxx a. Thus, 


w'X~ N (au, a! Kxx ), a@eR’. (23.44) 


Using the expression (19.29) for the characteristic function of the univariate Gaus- 
sian distribution (with the substitution cw! for p, the substitution @' Kxx @ 
for o?, and the substitution 1 for @), we obtain that the characteristic func- 
tion ®x(-), which is defined as Ele"), is given by 

Bx (w) =e 27 Kx wtioln = GeR”, (23.45) 
Since this characteristic function is fully determined by the mean vector and the 


covariance matrix of X, it follows that the distribution is also determined by the 
mean and covariance. We have thus proved: 


Theorem 23.6.7 (The Mean and Covariance of a Gaussian Determine its Law). 
Two Gaussian vectors of equal mean vectors and of equal covariance matrices have 
identical distributions. 


Note 23.6.8. Theorem 23.6.7 and Proposition 23.6.1 combine to prove that for 
every 2 € R” and every n x n positive semidefinite matrix K there exists one, and 
only one, Gaussian distribution of mean ys and covariance matrix K. We denote 
this Gaussian distribution by (py, K). 


By (23.45) it follows that if X ~ NM (y,K) then 


Dy (w) =e 2 Ketio™H = aeR”. (23.46) 


Theorem 23.6.7 has important consequences, one of which has to do with the 
properties of independence and uncorrelatedness. Recall that any two independent 
random variables (of finite mean) are also uncorrelated. The reverse is not in 
general true. But for jointly Gaussians it is: if X and Y are jointly Gaussian, then 
X and Y are independent if, and only if, they are uncorrelated. More generally: 


Corollary 23.6.9. Let X be a centered Gaussian (n1 + n2)-vector. Let the random 
ny,-vector X, = (XM, Iie oer correspond to its first n, components, and let 
X2= (xr, aa pax Cornet correspond to the rest of its components. Then the 
vectors X, and Xz are independent if, and only if, they are uncorrelated, t.e., tf, 
and only if, 

E[X1X}] =0. (23.47) 


Proof. The easy direction, which has nothing to do with Gaussianity, is that if 
X, and Xp» are centered and independent, then (23.47) holds. Indeed, by the 
independence and the fact that the vectors are of zero mean we have 


E[X1X}] = Efi] E[X3] 
= E[X,] (E[X2])" 
=o00!' 
=0. 
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We now prove the reverse using the Gaussianity. We begin by expressing the 
covariance matrix of X in terms of the covariance matrices of X; and X2 as 


Kee — (F(R XT] [Ki Xx3 
A NE[XoX)| El XoXd 
Kx,x, 0 
= 23.48 
( 0 ee) 2) 


where the second equality follows from (23.47). 


Next, let Xi, and X4 be independent random vectors such that X, = X, and 
X5, = Xp». Let X’ be the (n; + ng)-vector that results from stacking X4 on top 
of X4. Since X is Gaussian, it follows from Corollary 23.6.5 that X must also 
be Gaussian, and since X{, has the same law as X1, it too is Gaussian. Similarly, 
X45 is also Gaussian. And since X{ and X%4 are, by construction, independent, it 
follows from Proposition 23.6.2 that X’ is a centered Gaussian. 

Having established that X’ is Gaussian, we next compute its covariance matrix. 
Since, by construction, X and X4 are independent and centered, 


(Kx, 0 
Kren = ( oF — 


Kx,x, 0 ) 
= 23.49 
Oe ee (28.49) 


where the second equality follows because the equality in law between X{ and X, 
implies that Kx:x, = Kx,x, and similarly for X54. 

Comparing (23.49) and (23.48) we conclude that X and X’ are centered Gaussians 
of identical covariance matrices. Consequently, by Theorem 23.6.7, X’ =X. And 


since the first n, components of X’ are independent of its last nz components, the 
same must also be true for X. 


Corollary 23.6.10. If the components of the Gaussian random vector X are uncor- 
related and the matrix Kxx is therefore diagonal, then the components of X are 
independent. 


Proof. By repeated application of Corollary 23.6.9. 


Another consequence of the fact that there is only one multivariate Gaussian distri- 
bution of a given mean vector and of a given covariance matrix has to do with pair- 
wise independence and independence. Recall that the random variables Xj,..., Xn, 
are pairwise independent if for each pair of distinct indices v’,v” € {1,...,n} 
the random variables X,, and X, are independent, i.e., if for all such v’,v’” and 
all £,,€)” ER 


Pr[ Xv =< by, Xyn < bv] = Pr[X., < | Pr[Xp < éy). (23.50) 
The random variables X1,...,X, are independent if for all €;,...,€, in R 


Pr[X; < &, for all j € {1,...,n}] = |] Pr[X; < &]. (23.51) 


j=1 
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Independence implies pairwise independence, but the two are not equivalent. One 
can find triplets of random variables that are pairwise independent but not inde- 
pendent. But if X,,...,X, are jointly Gaussian, then pairwise independence is 
equivalent to independence: 


Corollary 23.6.11. If the components of a Gaussian random vector are pairwise 
independent, then they are independent. 


Proof. If the components of the Gaussian n-vector X are pairwise independent, 
then they are pairwise uncorrelated and the covariance matrix Kxx must be diag- 
onal. Denote the diagonal elements by \1,...,A,- Let ps be the mean vector of X. 
Another Gaussian vector of this mean and of this covariance matrix is the Gaussian 
vector whose components are independent VV (nO ), rj). Since the mean and covari- 
ance determine the distribution of Gaussian vectors, it follows that the two vectors, 
in fact, have identical laws so the components of X are also independent. 


Corollary 23.6.12. Jf W is a standard Gaussian n-vector, and if U is ann xn 
orthogonal matrix, then UW is also a standard Gaussian vector. 


Proof. By Definition 23.1.1 it follows that the random vector UW is a centered 
Gaussian. By (23.39) we obtain that the orthogonality of the matrix U implies 
that the covariance matrix of this centered Gaussian is the identity matrix, which 
is also the covariance matrix of W; see (23.37). Consequently, UW and W are 
two centered Gaussian vectors of identical covariance matrices and hence, by The- 
orem 23.6.7, of equal law. Since W is standard, this implies that UW must also 
be standard. 


The next corollary shows that if X is a centered Gaussian n-vector, then X = AW 
for a standard Gaussian n-vector W and some square matrix A. That is, if the 
law of an n-vector X is equal to the law of AW where A is an n x m matrix and 
where W is a standard Gaussian m-vector, then the law of X is also identical to 
the law of AW, where A is some n x n matrix and where W is a standard Gaussian 
n-vector. Consequently, we could have required in Definition 23.1.1 that the matrix 
A be square without changing the set of distributions that we define as Gaussian. 


Corollary 23.6.13. If X is a centered Gaussian n-vector, then there exists a de- 
terministic square n Xx n matriz A such that X a AW, where W is a standard 
Gaussian n-vector. 


Proof. Let Kxx denote the covariance matrix of X. Being a covariance matrix, 
Kxx must be positive semidefinite (Proposition 23.6.1). Consequently, by Propo- 
sition 23.3.7, there exists some n x n matrix S such that 


Kxx = S'S. (23.52) 


Consider now the centered Gaussian S'W, where W is a standard Gaussian n- 
vector. By (23.39), the covariance matrix of S™W is S'S, which by (23.52) is 


9A classical example is the triple X,Y, Z where X and Y are IID each taking on the values 
+1 equiprobably and where Z is their product. 
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equal to Kxx. Thus X and S'W are centered Gaussians of the same covariance, 
and so they must be of the same law. We have thus established that the law 
of X is the same as the law of the product of a square matrix (S') by a standard 
Gaussian (W). 


23.6.3. A Canonical Representation of a Centered Gaussian 


The representation of a centered Gaussian vector as the result of the multiplication 
of a deterministic matrix by a standard Gaussian vector is not unique. Indeed, 
whenever the n x m matrix A satisfies AAT = K it follows that if W is a standard 
Gaussian m-vector, then AW ~ V(0,K). (This follows because AW is a random 
n-vector of covariance matrix AAT (23.39); it is, by Definition 23.1.1, a centered 
Gaussian; and all centered Gaussians of a given covariance matrix have the same 
law.) We saw in Corollary 23.6.13 that A can always be chosen as a square matrix. 
Thus, to every K > 0 there exists a square matrix A such that AW ~ N’(0,kK). In 
this section we shall focus on a particular choice of the matrix A that is useful in 
the analysis of Gaussian vectors. In this representation A is a square matrix that 
can be written as the product of an orthogonal matrix by a diagonal matrix. The 
diagonal matrix acts on W by stretching and shrinking its components, and the 
orthogonal matrix then rotates (and possibly reflects) the result. 


Theorem 23.6.14 (A Canonical Representation of a Gaussian Vector). Let X be 
a centered Gaussian n-vector of covariance matriz Kxx. Then 


x SUN? Ww, 


where W is a standard Gaussian n-vector; the n x n matriz U is orthogonal; the 
nxn matriz A is diagonal; the diagonal elements of A are the eigenvalues of Kxx; 
and the j-th column of U is an eigenvector corresponding to the eigenvalue of Kxx 
that is equal to the j-th diagonal element of A. 


Proof. By Proposition 23.6.1, Kxx is positive semidefinite and a fortiori sym- 
metric. Consequently, by Proposition 23.3.6, there exists a diagonal matrix A 
whose diagonal elements are the eigenvalues of Kxx and there exists an orthogo- 
nal matrix U such that Kxx U = UA, so the j-th column of U is an eigenvector 
corresponding to the eigenvalue given by the j-th diagonal element of A. Since, 
Kxx = 0, it follows that all its eigenvalues are nonnegative, and we can define the 
matrix A!/? as the matrix whose components are the componentwise nonnegative 
square roots of the matrix A. As in (23.21), choose § = A!/?U'. We then have 
that Kxx = S'S. If W is a standard Gaussian, then S'W is a centered Gaussian 
of zero mean and covariance $'S. Since S'S = Kxx and since there is only one 
centered multivariate Gaussian distribution of a given covariance matrix, it follows 
that the law of STW ( = UA!/?W) is the same as the law of X. 


Corollary 23.6.15. A centered Gaussian vector can be expressed as the result of 
an orthogonal transformation applied to a random vector whose components are 
independent centered univariate Gaussians of different variances. These variances 
are the eigenvalues of the covariance matrix. 
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Figure 23.1: Contour plot of the density of four different two dimensional Gaussian 
random variables: from left to right and top to bottom Xj,...,X4. 


Proof. Because the matrix A in the theorem is diagonal, we can write A!/?W as 


Vw) 
NeW = 


? 


7 /X,W) 


where \1,..., A, are the diagonal elements of A, i.e., the eigenvalues of Kxx. Thus, 
the random vector A!/?W has independent components with the v-th component 
being (0, Av). 


Figures 23.1 and 23.2 demonstrate this canonical representation. They depict the 
contour lines and mesh plots of the density functions of the following four two- 
dimensional Gaussian vectors: 


1 O 
Xy = ( _ WwW, Kx,x, =le, 


2 0 4 0 
Xo = & . WwW, Kx5x5 = (i : 5) 


1 0 1 0 
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Figure 23.2: Mesh plots of the density functions of Gaussian random vectors: from 
left to right and top to down Xj,..., X4. 
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where W is a standard Gaussian vector with two components. 


Theorem 23.6.14 can be used to find a linear transformation that transforms a 
given Gaussian vector to a standard Gaussian. The following is the multivariate 
version of the univariate result showing that if X ~ N(y,07), where o? > 0, then 
(X — n)/o has a N(0,1) distribution (19.8). 


Proposition 23.6.16 (From Gaussians to Standard Gaussians). Let the random 
n-vector X be N(p,K), where K > 0 and pp € R”. Let then x n matrices A and U 
be such that A is diagonal, U is orthogonal, and KU = UA. Then 


N-V2UT(X =p) ~N(O; In); 
where A~'/? is the diagonal matrix whose diagonal entries are the reciprocals of the 
square roots of the diagonal elements of A. 


Proof. Since an affine transformation of a Gaussian vector is Gaussian (Proposi- 
tion 23.6.3), it follows that A~!/?U'(X — pw) is a Gaussian vector. And since the 
mean and covariance of a Gaussian vector fully specify its law (Theorem 23.6.7), 
the result will follow once we show that the mean of A~!/2U'(X — ps) is the zero 
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vector and its covariance matrix is the identity matrix. This can be readily verified 
using (23.39). 


23.6.4 The Density of a Gaussian Vector 


As we saw in Corollary 23.4.2, if the covariance matrix of a centered vector is 
singular, then at least one of its components can be expressed as a deterministic 
linear combination of its other components. Consequently, random vectors with 
singular covariance matrices cannot have a density. If the covariance matrix is 
nonsingular, then the vector may or may not have a density. If it is Gaussian, then 
it does. In this section we shall derive the density of the multivariate Gaussian 
distribution when the covariance matrix is nonsingular. 


We begin with the centered case. To derive the density of a centered Gaussian 
n-vector of positive definite covariance matrix K we shall use Theorem 23.6.14 to 
represent the N(0,K) distribution as the distribution of UA'/?W where U is an 
orthogonal matrix and A is a diagonal matrix satisfying KU = UA. Note that A 
is nonsingular because its diagonal elements are the eigenvalues of K, which we 
assume to be positive definite. 
Let 

B= UA!/?, (23.53) 


so the density we are after is the density of BW. Note that, by (23.53), 


BB! = UA1/2A1/2yT 


= UAUT 
=K. (23.54) 
Also, by (23.54), 
|det(B)| = \/det(B) det(B) 
= ,/det(B) det(BT) 
= ,/det(BBT) 
= \/det(k), (23.55) 


where the first equality follows by expressing || as Vx?; the second follows because 
a square matrix and its transpose have the same determinant; the third because the 
determinant of the product of square matrices is the product of the determinants; 
and where the last equality follows from (23.54). 


Using the formula for computing the density of BW from that of W (Theo- 
rem 17.3.4), we have that if X = BW, then 


_ fw(Bohx) 
PO) = Tae (BYT 
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exp —}(B-1x)"(B-'x)) 
(27m)”/?|det(B)| 


( 
exp(—}x"(B-) "(Bo'x)) 
( 


(27)”/?|det(B)| 
x! (BB‘) ve) 


2 


( 
_ exp(— 5x'K—'x) 
(27)"/?|det(B)| 
= exp(— 5x! K—!x) 


~ (2m)"/2\/det(K) ’ 


where the second equality follows from the density of the standard Gaussian (23.36); 
the third from the rule for the transpose of the product of matrices (23.3); the fourth 
from the representation of the inverse of the product of matrices as the product 
of the inverses in reverse order (AB)~! = B~'A~! and because transposition and 
inversion commute; the fifth from (23.54); and the sixth from (23.55). It follows 
that if X ~.AN(0,K) where K is nonsingular, then 


rae exp(— 5x! K—!x) 


ee R”. 
s/(277)"det(K) 


fx (x) 


» 


Accounting for the mean, we have that if X ~ N(j,K) where K is nonsingular, 
then 


exp (—3(x — #)'K~*(x — p)) 


Fa(x) = Jem)ndet(K) 


, xeR”. (23.56) 


23.6.5 Linear Functionals of Gaussian Vectors 


A linear functional on R” is a linear mapping from R” to R. For example, if a 
is any fixed vector in R”, then the mapping 


xralx (23.57) 


is a linear functional on R”. In fact, as we next show, every linear functional on R” 
has this form. This can be proved by using linearity to verify that we can choose 
the j-th component of @ to equal the result of applying the linear functional to 
the vector e; whose components are all zero except for its j-th component which 
is equal to one. 


If X is a Gaussian n-vector and if a@ € R”, then, by Proposition 23.6.3 (applied 
with the substitution of the 1 x n matrix a! for C), it follows that a'X is a Gaus- 
sian vector with only one component. By Proposition 23.6.6, this sole component 
must have a univariate Gaussian distribution. We thus conclude that the result of 
applying a linear functional to a Gaussian vector is a Gaussian random variable. 
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We next show that the reverse is also true: if X is of mean ys and of covariance 
matrix K and if the result of applying every linear functional to X has a univari- 
ate Gaussian distribution, then X ~ N(u,K).'° To prove this result we compute 
the characteristic function of X. For every w@ € R” the mapping x + w'x is 
a linear functional on R”. Consequently, our assumption that the result of the 
application of every linear functional to X has a univariate Gaussian distribution 
implies (23.44). From here we can follow the steps leading to (23.46) to conclude 
that the characteristic function of X must be given by the RHS of (23.46). Since 
this is also the characteristic function of a V(t, K) random vector, it follows that 
X ~ N(u,K), because random vectors of identical characteristic functions must 
have identical distributions (Proposition 23.4.4). We have thus proved: 


Theorem 23.6.17 (Gaussian Vectors and Linear Functionals). A random vector X 
is Gaussian if, and only if, every linear functional of X has a univariate Gaussian 
distribution. 


23.7 Jointly Gaussian Vectors 


Three miracles occur when we compute the conditional distribution of X given 
Y = y for jointly Gaussian random vectors X and Y. Before describing these 
miracles we need to define jointly Gaussian vectors. 


Definition 23.7.1 (Jointly Gaussian Vectors). Two random vectors are said to be 
jointly Gaussian if the vector that results when one is stacked on top of the other 
is Gaussian. 


That is, the random ny-vector X = (X,...,X("=))T and the random Ny-vector 
Y =(Y,...,Y)" are jointly Gaussian if the random (n;, + n,)-vector 


(x, Xe) yO yew)! 


is Gaussian. 

By Corollary 23.6.5, the random vectors X and Y can only be jointly Gaussian if 
each is Gaussian. But this is not enough: both X and Y can be Gaussian without 
them being jointly Gaussian. However, if X and Y are independent Gaussian 
vectors, then, by Proposition 23.6.2, they are jointly Gaussian. 


Proposition 23.7.2. Independent Gaussian vectors are jointly Gaussian. 


By Corollary 23.6.9 we have: 


Proposition 23.7.3. If two jointly Gaussian random vectors are uncorrelated, then 
they are independent. 


101¢ is not difficult to show that the assumption that X is of finite variance is not necessary. 
If every linear functional of X is of finite variance, then X must be of finite variance. Thus, we 
could have stated the result as follows: if a random vector is such that the result of applying 
every linear functional to it is a univariate Gaussian, then it is a multivariate Gaussian. 
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Having defined jointly Gaussian random vectors we next turn to the main result of 
this section. Loosely speaking, it states that if X and Y are jointly Gaussian, then 
in computing the conditional distribution of X given Y = y three miracles occur: 


(i) the conditional distribution is a multivariate Gaussian; 
(ii) its mean vector is an affine function of y; 


(iii) and its covariance matrix does not depend on y. 


Before stating this more formally, we justify two simplifying assumptions. The first 
assumption is that the covariance matrix of Y is nonsingular, so 


Kyy > 0. 


The reason is that if the covariance matrix of Y is singular, then, by Corol- 
lary 23.4.2, some of its components are with probability one affine functions of 
the others, and we then have to consider two cases. If the realization y satisfies 
these affine relations, then we can just pick a subset of the components of Y that 
determine all the other components and that have a nonsingular covariance matrix 
as in Section 23.4.3 and ignore the other components of y; the ignored components 
do not alter the conditional distribution of X given Y = y. The other case where 
the realization y does not satisfy the relations that Y satisfies with probability one 
can be ignored because it occurs with probability zero. 


The second assumption we make is that both X and Y are centered. There is no 
loss in generality in making this assumption for the following reason. Conditioning 
on Y = y when Y has mean p, is equivalent to conditioning on Y — py = y — fy. 
And if X has mean pz, then we can compute the conditional distribution of X by 
computing the conditional distribution of X — zz and by then shifting the resulting 
distribution by 2. Thus, the conditional density fx;y—y(-) is given by 


Fxjysy(*) = fx-pely—py=y—py (X — Me); (23.58) 


where X — pz & Y — py are jointly Gaussian and centered whenever X & Y are 
jointly Gaussian. It is now straightforward to verify that if the miracles hold for 
the centered case 


x1 fx—p.[Y¥—wy=y—p, (X) 
then they also hold for the general case 
X FX pe|¥—py=y—p, (X — Me) 


Theorem 23.7.4. Let X and Y be centered and jointly Gaussian with covariance 
matrices Kxx and Kyy. Assume that Kyy > 0. Then the conditional distribution 
of X conditional on Y = y is a multivariate Gaussian of mean 


E[XY'] Kyy y (23.59) 
and covariance matrix 


Kxx —E[XY‘] Ke E[YX"]. (23.60) 
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Proof. Let nz and n, denote the number of components of X and Y. Let D be 
any deterministic real nz X ny matrix. Then clearly 


X =DY+ (X-DY). (23.61) 


Since X and Y are jointly Gaussian, the vector yy" is Gaussian. Conse- 


quently, since 
X—-DY)\ /In, —D\ /X 
Y Abe ~ ae Y/}? 


it follows from Proposition 23.6.3 that 
(X — DY) and Y are centered and jointly Gaussian. (23.62) 


Suppose now that the matrix D is chosen so that (KX —DY) and Y be uncorrelated: 
E[(X —DY)Y"] =0. (23.63) 


By (23.62) and Proposition 23.7.3 it then follows that the random vector (X—DY) 
is independent of Y. Consequently, with this choice of D we have that (23.61) 
expresses X as the sum of two terms where the first, DY, is fully determined 
by Y and where the second, (X — DY), is independent of Y. It follows that 
the conditional distribution of X given Y = y is the same as the distribution of 
(X — DY) but shifted by Dy. By (23.62) and Corollary 23.6.5, (X — DY) is a 
centered Gaussian, so the conditional distribution of X given Y = y is that of the 
centered Gaussian (X — DY) shifted by the vector Dy. This already establishes 
the three “miracles” we discussed before: the conditional distribution of X given 
Y = y is Gaussian; its mean Dy is a linear function of Y; and its covariance matrix, 
which is the covariance matrix of (KX — DY), does not depend on the realization y 
of Y. 


The remaining claims, namely that the mean of the conditional distribution is as 
given in (23.59) and that the covariance matrix is as given in (23.60) now follow 
from straightforward calculations. Indeed, by solving (23.63) for D we obtain 


-1 

D=E[XY"] Kyy, (23.64) 
so Dy is given by (23.59). To show that the covariance of the conditional law of X 
given Y = y is as given in (23.60), we note that this covariance is the covariance 

of (X — DY), which is given by 
E[(X — DY)(K — DY)"] = E[(X —DY)x" 
=E|(X—DY)x! 
= E|(X —DY)X"] 
= Kxx -DE[YX"] 


] -E[(K - DY)(DY)"] 
|] -E[(K-Dy)yy']D" 


[ 


= Kxx -E[XY"] is Efyx'], 


where the first equality follows by opening the second set of parentheses; the second 
by (23.3) and (23.25b); the third by (23.63); the fourth by opening the parentheses 
and using the linearity of the expectation; and the final equality by (23.64). 
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Theorem 23.7.4 has important consequences in Estimation Theory. A key result 
in Estimation Theory is that if after observing that Y = y for some y € R™ we 
would like to estimate the random n,-vector X using a (Borel measurable) function 
g: R"» — R”™ so as to minimize the estimation error 


El |X — 9(¥) ||], (23.65) 
then an optimal choice for g(-) is the conditional expectation 
gy) =E[X|Y=y], yeR™. (23.66) 


Theorem (23.7.4) demonstrates that if X and Y are jointly Gaussian and centered, 
then E[X|Y = y] is a linear function of y and is explicitly given by (23.59). Thus, 
for jointly Gaussian centered random vectors, there is no loss in optimality in 
limiting ourselves to linear estimators. 


The optimality of choosing g(-) as in (23.66) has a simple intuitive explanation. We 
first note that it suffices to establish the result when n, = 1, i-e., when estimating 
a random variable rather than a random vector. Indeed, the squared-norm error in 
estimating a random vector X with nz components is the sum of the squared errors 
in estimating its components. To minimize the sum, one should therefore minimize 
each of the terms. And the problem of estimating the j-th component of X based 
on the observation Y = y is a problem of estimating a random variable. Stated 
differently, to estimate X so as to minimize the error (23.65) we should separately 
estimate each of its components. 


Having established that it suffices to prove the optimality of (23.66) when n, = 1, 
we now assume that n, = 1 and denote the random variable to be estimated by X. 
To study how to estimate X after observing that Y = y, we first consider the 
case where there is no observation. In this case, the estimate is a constant, and 
by Lemma 14.4.1 the optimal choice of that constant is the mean E[X]. We now 
view the general case where we observe Y = y as though there were no observables 
but X had the a posteriori distribution given Y = y. Utilizing the result for the 
case where there are no observables yields that estimating X by E[X|Y = y] is 
optimal. 


23.8 Moments and Wick’s Formula 


We next describe without proof a technique for computing moments of centered 
Gaussian vectors. A sketch of a proof can be found in (Zvonkin, 1997). 


Theorem 23.8.1 (Wick’s Formula). Let X be a centered Gaussian n-vector and 
let g1,..., 824: R"” — R be an even number of (not necessarily different) linear 
functionals on R”. Then 


E[gi(X) Jo(X) +++ gor (X)] 
= $5 E91 (X) 9a: (X)J EL 9p) 9a2(%)] + lg. (KX) gq. (X)], (23.67) 


where the summation is over all permutations pi, 41, P2;92;--->Pk: dk Of 1,2,...,2k 
such that 
Pi < p2<-+++ <Dk (23.68a) 
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and 

P<, P2<G2, +++, Pk < dk. (23.68b) 
The number of terms on the RHS of (23.67) is 1x 3x5 x--- x (2k—1). 
Example 23.8.2. Suppose that n = 1, so X is a centered univariate Gaussian. 
Let o? be its variance, and suppose we wish to compute E [x Al . We can express this 
in the form of Theorem 23.8.1 with k = 2 and gi(x) = go(x) = g3(@) = ga(x) = @. 
By Wick’s Formula 

E[X*] = Elgi(X) 92(X)] E[gs3(X) 94(X)] + Elgi(X) 93(X)] Efgo(X) 94(X)] 
+ Elgi(X) 9a(X)] Elg2(X) 93(X) 


= 304, 


which is in agreement with (19.31). 
Example 23.8.3. Suppose that X is a bivariate centered Gaussian whose com- 


ponents are of unit variance and of correlation coefficient p € [—1,1]. We com- 
pute E[(X)?(X@))?] using Theorem 23.8.1 by setting k = 2 and by defining 
g(x) = go(x) =a and g3(x) = ga(x) = «). By Wick’s Formula 
E|(x®)?(x@)"] 
= Elgi(X) 92(X)]Elgs(X) 94(X)] + Eli X) 93(X)] El 92%) 94(X)] 
+ Elgi(X) 94(X)] E[g2(X) 93(X)] 
= E[(x)?| E[(x®)’] ey = [xOx®] E[xOx®] 
Aap [xox E [xOx@| 
= 1+ 29”. (23.69) 
Similarly, 
E [(x@)P?x@| = 3p. (23.70) 
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The results of Section 19.9 on limits of Gaussian random variables extend to Gaus- 
sian vectors. In this setting we consider random vectors X, Xj, X2,... defined 
over the probability space (QO, 7, P). We say that the sequence of random vectors 
X,,Xo,... converges to the random vector X with probability one or almost 
surely if 


Pr( {w EQ: lim Xp(w) = X(w)}) =e (23.71) 
The sequence X 1, X2,... converges to the random vector X in probability if 
Jim Pr[|[Kn-Xl|>q=0, €>0. (23.72) 
The sequence X 1, X2,... converges to the random vector X in mean square if 


lim E [IX = X|"] =0. (23.73) 
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Finally, the sequence of random vectors X1,X2,... taking value in R@ converges 
to the random vector X weakly or in distribution if 


lim Pr[X@ < €%,..., XM < 69) =Pr[xM < €,..., XO < €] (23.74) 
for every vector € € R@ such that 


lim Pr[X < E(t) Sigh on ee E) * c] 
€|0 


=Pr[ x) <e0)) KM) <6). (23.75) 


In analogy to Theorem 19.9.1 we next show that, irrespective of which of the above 
forms of convergence we consider, if a sequence of Gaussian vectors converges to 
some random vector X, then X must be Gaussian. 


Theorem 23.9.1. Let the random d-vectors X,X 1, X2,... be defined over a com- 
mon probability space. Let X ,X2,... each be Gaussian (with possibly different 
mean vectors and covariance matrices). If the sequence X1,X2,... converges to X 


in the sense of (23.71) or (23.72) or (23.73), then X must be Gaussian. 


Proof. The proof is based on Theorem 23.6.17, which demonstrates that it suffices 
to consider linear functionals of the vectors in the sequence and on the analogous 
result for scalars (Theorem 19.9.1). We demonstrate the idea by considering the 
case where the convergence is almost sure. If X1,X2,... converges almost surely 
to X, then for every a € R@ the sequence a'X,,a'X2,... converges almost surely 
to a'X. Since, by Theorem 23.6.17, linear functionals of Gaussian vectors are 
univariate Gaussians, it follows that the sequence aX 1,a@'Xo,... is a sequence of 
Gaussian random variables. And since it converges almost surely to a! X, it follows 
from Theorem 19.9.1 that a'X must be Gaussian. Since this is true for every a 
in R¢, it follows from Theorem 23.6.17 that X must be a Gaussian vector. 


In analogy to Theorem 19.9.2 we have the following result on weakly converging 
Gaussian vectors. 


Theorem 23.9.2 (Weakly Converging Gaussian Vectors). Let the sequence of 
random d-vectors X1,X2,... be such that X, ~ N(ptn, Kn) for n = 1,2,... Then 
the sequence converges in distribution to some limiting distribution, if, and only tf, 
there exist some pp € R¢ and some d x d matrix K such that 


Ln — pw and K, — K. (23.76) 


And if the sequence does converge in distribution, then it converges to the multi- 
variate Gaussian distribution of mean vector pe and covariance matria K. 


Proof. See (Gikhman and Skorokhod, 1996, Chapter I, Section 3, Theorem 4). 
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23.10 Additional Reading 


There are numerous books on Matrix Theory that discuss orthogonal matrices and 
positive semidefinite matrices. We mention here (Zhang, 1999, Section 5.2), (Her- 
stein, 2001, Chapter 6, Section 6.10), and (Axler, 1997, Chapter 7) on orthogonal 
matrices, and (Zhang, 1999, Chapter 6), (Axler, 1997, Chapter 7), and (Horn and 
Johnson, 1985, Chapter 7) on positive semidefinite matrices. Much more on the 
multivariate Gaussian can be found in (Tong, 1990) and (Johnson and Kotz, 1972, 
Chapter 35). For more on estimation and linear estimation, see Poor (1994) and 
(Kailath, Sayed, and Hassibi, 2000). 


23.11 Exercises 


Exercise 23.1 (Covariance Matrices). Which of the following matrices cannot be a co- 
variance matrix of some real random vector? 


am 5 1 2 10 1 41 
a=(5 4): B= (> 3): c= (i ae o=(4, re 


Exercise 23.2 (An Orthogonal Matrix of Determinant 1). Show that in Theorem 23.6.14 
the orthogonal matrix U can be chosen to have determinant +1. 


Exercise 23.3 (A Mixture of Gaussians). Let X ~ N(pc,07) and Y ~ N(py,0%) be 
Gaussian random variables. Let E take on the values 0 and 1 equiprobably and indepen- 
dently of (X,Y). Define the mixture RV 


zal #E=0, 
Y ifB=1. 


Must Z be Gaussian? Can Z be Gaussian? Compute Z’s characteristic function. 


Exercise 23.4 (Multivariate Gaussians). Show that if Z is a univariate Gaussian, then 
the random vector (Z, Z)" is a Gaussian vector. What is its canonical representation? 


Exercise 23.5 (Manipulating Gaussians). Let Wi,W2,...,Ws be IID N(0,1). Define 
Y = 3W, + 4W2 — 2W3 4+ Wa — Ws and Z = Wi — 4W2 — 2W3 + 3W4 — Ws. What is the 
joint distribution of (Y, Z)? 


Exercise 23.6 (Largest Eigenvalue). Let X be a zero-mean Gaussian n-vector of covari- 
ance matrix K > 0, and let Amax denote the maximal eigenvalue of K. Show that for some 
random n-vector Z independent of X 


X+Z~N(0,Amaxln), 


where I,, denotes the n x n identity matrix. 
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Exercise 23.7 (The Error Probability Revisited). Show that p*(correct) of (21.68) in 
Problem 21.5 can be rewritten as 


p (correct) = a exp (- ss) Elexp (- max{="})] 


where (Eo, rere —E™M))T is a centered Gaussian with a covariance matrix whose Row-j 
Column-é entry is (Sj, 82) p- 


Exercise 23.8 (Gaussian Marginals). Let X and Z be IID V(0,1). Let Y = |Z|sgn(X), 
where sgn(X) is 1 if X > 0 and is —1 otherwise. Show that X is Gaussian, that Y is 
Gaussian, but that they are not jointly Gaussian. Sketch the contour lines of their joint 
probability density function. 


Exercise 23.9 (Characteristic Function of a Random Vector). Let X be a random vector 
with two components whose characteristic function is ®x(-). Express the characteristic 
function of the sum of its components in terms of ®x(-). 


Exercise 23.10 (The Distribution of Linear Functionals). Let X and Y be random n- 
vectors of components X,...,X and Y™,...,Y™. Assume that for all determinis- 
tic coefficients ai,...,@n € R the random variables )7”_, a, X™ and Dee ss avY ” have 
the same distribution, i.e., 


(Sax “ Yar), (01,....0n €R). 
j=l j=l 


(i) Show that the characteristic function of X must be equal to that of Y. 
(ii) Show that X and Y must have the same distribution. 


Exercise 23.11 (Independence, Uncorrelatedness and Gaussianity). Let the random vari- 
ables X and H be independent with X ~ N(0,1) and with H taking on the values +1 
equiprobably. Let Y = HX denote their product. 


(i) Find the density of Y. 
(ii) Are X and Y correlated? 
(iii) Compute Pr[|X| > 1] and Pr[|Y| > 1]. 
(iv) Compute the probability that both |X| and |Y| exceed 1. 
(v) Are X and Y independent? 
) 


(vi) Is the vector (X,Y) a Gaussian vector? 


Exercise 23.12 (Expected Maximum of Jointly Gaussians). 


(i) Let (X1, X2,...,Xn,Y) have an arbitrary joint distribution with E[Y] = 0. Here 
Y need not be independent of (X1, X2,..., Xn). Prove that 
E| max {X)+¥}] =€| max {x5}]. 


<j<n <j<n 


(ii) Use Part (i) to prove that if (U,V) are jointly Gaussian and of zero mean, then 


E(u - vy?) 


E[max{U, V}] = On 
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Exercise 23.13 (The Density of a Bivariate Gaussian). Let X and Y be jointly Gaussian 
with means siz and jy and with positive variances o2 and cme Let 


_ Cov[X, Y] 
On Oy 


be their correlation coefficient. Assume |p| < 1. 


(i) Find the joint density of X and Y. 
(ii) Find the conditional density of X given Y = y. 


Exercise 23.14 (A Training Symbol). Conditional on (X1, X2) = (#1, 2), the observable 
(Yi, Y2) is given by 

Y=Aryt+Z,, v=1,2, 
where Z1, Zz, and A are independent with Z1,Z2 ~ IID N (0,07) and A ~ N(0,1). 
Suppose that X1 = 1 (deterministically) and that X2 takes on the values +1 equiprobably. 


(i) Derive an optimal rule for guessing X2 based on (Yi, Y2). 


(ii) Consider a decoder that operates in two stages. In the first stage the decoder 
estimates A from Y, with an estimator that minimizes the mean squared-error. 
In the second stage it uses the ML decoding rule for guessing X2 based on Y2 
by pretending that A is given by its estimate from the first stage. Compute the 
probability of error of this decoder. Is it optimal? 


Exercise 23.15 (On Wick’s Formula). Let X be a centered Gaussian n-vector, and let 


£1,---,82r+1: R” — R be an odd number of (not necessarily different) linear functionals 
from R” to R. Show that 


E[gi(X) go(X) «++ gory1(X)] = 0. 


Exercise 23.16 (Jointly Gaussians with Positive Correlation). Let X and Y be jointly 
Gaussian with means jz, and fy; positive variances o? and Cer and correlation coefficient p 
as in Exercise 23.13 satisfying |p| < 1. 


(i) Show that, conditional on Y = y, the distribution of X is Gaussian with mean 
ba + psely — py) and variance o2(1— p”). 
(ii) Show that if p > 0, then the family fx )y(x|y) has the monotone likelihood ratio 
property that the mapping 
fxiy (ly) 
po Ed 
fxiy (aly’) 
is nondecreasing whenever y’ < y. Here fx|y(-|y) is the conditional density of X 
given Y = y. 
(iii) Show that if p > 0, then the joint density fx,y(-) has the Total Positivity of 
Order 2 (TP2) property, ice., 
fv (sy) fxvley) S far (ey) fryla'sy), (2° <2, 9 <y). 


See (Tong, 1990, Chapter 4, Section 4.3.1, Fact 4.3.1 and Theorem 4.3.1). 
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Exercise 23.17 (Price’s Theorem). Let X be a centered Gaussian n-vector of covariance 
matrix A. Let \%-9 = EX x] be the Row-j Column-é entry of A. Let fx(x;A) 
denote the density of X (when A is nonsingular). 


(i) 


(iii) 


Expressing the FT of the partial derivative of a function in terms of the FT of the 
original function and using the characteristic function of a Gaussian (23.46), derive 
Plackett’s Identities 


Ofx(x;A) _ 10? fx(x;A) Ofx(x;A) _ 0 fx(x;A) 
AGA —~ 2 a@np ”’ DGD ~ OxDax®’ 


j#e 


Using integration by parts, derive Price’s Theorem: if h: R” — R is twice con- 
tinuously differentiable with h and its first and second derivatives growing at most 
polynomially in ||x|| as ||x|| — oo, then 

ae[n(X)] h(x) 


OG = fon Seana ROHN dx, FAL 


(See (Adler, 1990, Chapter 2, Section 2.2) for the case where A is singular.) 


Show that if in addition to the assumptions of Part (ii) we also assume that for 
some j ££ 


0°h(x) n 
Aca 2% XER", (23.77) 


then E[h(X)] is a nondecreasing function of \9. 


Conclude that if h(x) = [][?_, g(a), where for each v € {1,...,n} the function 
gv: R — R is nonnegative, nondecreasing, twice continuously differentiable, and 
satisfying the growth conditions of h in Part (ii), then 


e| Hox) 


is monotonically nondecreasing in \9") whenever j # &. 
By choosing g,(-) to approximate the step function a@ +> I{a > co} for properly 
chosen cM) prove Slepian’s Inequality: if X ~ NV (jz, A), then for every choice of 
ED EMER 

Pr[x® See 9 KOSS a) 
is monotonically nondecreasing in \%") whenever j # £. See (Tong, 1990, Chap- 
ter 5, Section 5.1.4, Theorem 5.1.7). 


Modify the arguments in Parts (iv) and (v) to show that if KX ~ N (js, A), then for 
every choice of £,...,€( ER 


Prixe 26) xO ceo 


is monotonically nondecreasing in A” whenever j # £. See (Adler, 1990, Chap- 
ter 2, Section 2.2, Corollary 2.4). 


Exercise 23.18 (Jointly Gaussians of Equal Sign). Let X and Y be jointly Gaussian and 
centered with positive variances and correlation coefficient p. Prove that 


Pr[XY > 0] =5+%, 


where —7/2 < ¢ < 7/2 is such that sin¢@ = p. We propose the following approach. 
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(i) Show that it suffices to prove the result when X and Y are of unit variance. 


(ii) Show that, for such X and Y, if we define 
1 Pp 
W = Xx y, 
J1— p? J1—p? 
then W and Z are IID N(0, 1). 
(iii) Show that X and Y can be expressed as 


Bay, 


X = Rsin(O+ ¢), Y = RceosO, 
where ¢ is as defined before, © is uniformly distributed over the interval [—7, 7), 
R is independent of ©, and fr(r) = re /2 I{r > O}. 
(iv) Justify the calculation 


Pr[XY > 0] =2Pr[X > 0, Y > 0} 
= 2Pr{sin(O + ¢) > 0, cos > 0} 
eee 
Sars 


Hint: Exercise 19.7 may be useful for Part (iii). 


Chapter 24 


Complex Gaussians and Circular Symmetry 


24.1 Introduction 


This chapter introduces the complex Gaussian distribution and the circular sym- 
metry property. We start with the scalar case and then extend these notions to 
random vectors. We rely heavily on Chapter 17 for the basic properties of complex 
random variables and on Chapter 23 for the properties of the multivariate Gaussian 
distribution. 


24.2 Scalars 


24.2.1 Standard Complex Gaussians 


Definition 24.2.1 (Standard Complex Gaussian). A standard complex Gaus- 
sian is a complex random variable whose real and imaginary parts are independent 


N(0,1/2) random variables. 


If W is a standard complex Gaussian, then its density is given by 


Ai ee. gee (24.1) 


Tv 


because 


fw(w) = fre(w),1m(w) (Re(w), Im(w)) 
= frecw) (Re(w)) fimcwy (m(w)) 


1 1 
e7 Re(w)? + | en Im(w)? 


Jn Jn 
1 

Sa elu? weC, 
TT 


where the first equality follows from the definition of the density fw(w) of a 
CRV W at w ¢€ C as the joint density frew) mw) of its real and imaginary 
parts (Re(W),Im(W)) evaluated at (Re(w),Im(w)) (Section 17.3.1); the second 
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because the real and imaginary parts of a standard complex Gaussian are indepen- 
dent; the third because the real and imaginary parts of a standard Gaussian are 
zero-mean variance-1/2 real Gaussians whose density can thus be computed from 
(19.6) (by substituting 1/2 for 0”); and where the final equality follows because for 
any complex number w we have Re(w)? + Im(w)? = |w]|?. 

Because the real and imaginary parts of a standard complex Gaussian W are of 
zero mean, it follows that 


E[W] = E[Re(W)] + iE[Im(W)} 
= 


And because they are each of variance 1/2, it follows from (17.14c) that a standard 
complex Gaussian W has unit-variance 


Var[W] = E[|W|?] =1. (24.2) 


Moreover, since a standard complex Gaussian is of zero mean and since its real 
and imaginary parts are of equal variance and uncorrelated, a standard Gaussian 
is proper (Definition 17.3.1 and Proposition 17.3.2), i.e., 


E[W]=0 and E[W?] =0. (24.3) 


Finally note that, by (24.1), the density fy(-) of a standard complex Gaussian 
is radially-symmetric, i.e., its value at w € C depends on w only via its mod- 
ulus |w|. A CRV whose density is radially-symmetric is said to be circularly- 
symmetric, but the definition of circular symmetry applies also to complex ran- 
dom variables that do not have a density. This is the topic of the next section. 


24.2.2 Circular Symmetry 


Definition 24.2.2 (Circularly-Symmetric CRV). A CRV Z is said to be circularly- 
symmetric if for any deterministic 6 € [—1,7) the distribution of e'?Z is identical 
to the distribution of Z: 

e?Z=7Z, o€ [-n,n). (24.4) 


Note 24.2.3. If the expectation of a circularly-symmetric CRV is defined, then it 
must be zero. 


Proof. Let Z be circularly-symmetric. It then follows from (24.4) that e'?Z and Z 
are of equal expectation, so 


E[Z] = E[e'*Z] 
= e'*E[Z], 6€[-1,7), 


which, by considering a ¢ for which e'? 4 1, implies that E[Z] must be zero. 


To shed some light on the definition of circular symmetry we shall need Proposi- 
tion 24.2.5 ahead, which is highly intuitive but a bit cumbersome to state. Before 
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stating it we provide its discrete counterpart, which is a bit easier to state: it 
makes formal the intuition that if after giving the wheel-of-fortune an arbitrary 
spin, you give it another fair spin, then the combined result is a fair spin that does 
not depend on the initial spin. The case 7 = 2 is critical in cryptography. It shows 
that taking the mod-2 sum of a binary source sequence with a sequence of IID 
random bits results in a sequence that is independent of the source sequence. 


Proposition 24.2.4. Fiz a positive integer n, and define the set A = {0,...,n—1}. 
Let N be a RV taking value in the set A. Then the following statements are 
equivalent: 


(a) The RV N is uniformly distributed over the set A. 


(b) For any integer-valued RV K that is independent of N, the RV(N+K) mod 7 
is independent of K and uniformly distributed over A.* 


Proof. We first show (b) = (a). To this end, define K to be a RV that takes on 
the value zero deterministically. Being deterministic, it is independent of every 
RV, and in particular of N. Statement (b) thus guarantees that (NV + 0) mod 77 is 
uniformly distributed over A. Since we have assumed from the outset that N takes 
value in A, it follows that (N +0) mod 7 = N, so the uniformity of (N +0) mod 7 
over A implies the uniformity of N over A. 

We next show (a) > (b). To this end, we need to show that if N is uniformly 
distributed over A and if K is independent of N, then? 


1 
1 


Pr[((N + K) mod n) =a|K =k =a (k Z, a A). (24.5) 
By the independence of N and Kk it follows that 
Pr|((N+K) mod n) =a | K= k| = Pr|((N-+4) mod n) = al, (k EZ, ae A), 


so to prove (24.5) it suffices to prove 


Pr|((N +) mod ) = a| = - (« Z, a A). (24.6) 


This can be proved as follows. Because N is uniformly distributed over A, it follows 
that N-+4& is uniformly distributed over the set {k,k+1,...,k+7—1}. And, because 
the mapping m +> (m mod 7) is a one-to-one mapping from {k,k+1,...,k+n-—1} 
onto A, this implies that (N + k) mod 7 is also uniformly distributed over A, thus 
establishing (24.6). 


Proposition 24.2.5. Let O be a RV taking value in [—1,7). Then the following 
statements are equivalent: 


1Here m mod 7 is the remainder of dividing m by n, i.e., the unique v € A such that m — v 
is an integer multiple of 7. E.g. 17 mod 8 = 1. 

?Recall that the random variables X and Y are independent if, and only if, the conditional 
distribution of X given Y is equal to the marginal distribution of X. 
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Figure 24.1: The function € > (€ mod [—7, +r)) plotted for € € [-7+ ¢,74+ ¢@). 


(a) The RV O is uniformly distributed over |—71,7). 


(b) For any real RV ® that is independent of 0, the RV (0 + ®) mod [—7, 7) is 
independent of ® and uniformly distributed over the interval [—1,7).° 


Proof. The proof is similar to the proof of Proposition 24.2.4 but with an added 
twist. The twist is needed because if X has a uniform density and if a function g 
is one-to-one (injective) and onto (surjective), then g(X) need not be uniformly 
distributed. (For example, if X ~ U ((0,1}) and if g: [0,1] — [0,1] maps € to €?, 
then g(X) is not uniform.) 


To prove that (b) implies (a) we simply apply (b) to the deterministic RV & = 0. 


We next prove that (a) implies (b). As in the discrete case, it suffices to show that 
if © is uniformly distributed over [—7,7), then for any deterministic ¢ € R the 
distribution of (0 + ¢) mod [—7,7) is uniform over [—7, 7), irrespective of ¢. To 
this end we first note that because © is uniform over [—7, 7) it follows that 0+ ¢ is 
uniform over [b—72,¢+77). Consider now the mapping g: [¢6—7,¢+7) > [—7,7) 
defined by g: € + (€ mod [—z,7)). This function is a one-to-one mapping onto 
[—7,7) and is differentiable except at the point €* € [6 —7,¢+4 7) satisfying 
€* mod [—7,7) = 7, ie., the point €* € [6—7,¢4+7) of the form €* = 2x7m+7 for 
some integer m. At all other points its derivative is 1; see Figure 24.1. (Incidentally, 
—1a + @¢ is mapped to a negative number if ¢ < €* and to a positive number if 
@ > &*. In Figure 24.1 we assume the latter.) Applying the formula for computing 
the density of g(X) from the density of X (Theorem 17.3.4) we find that if 0 +¢ 
is uniform over [¢ —1,¢+ 7), then g(¢+ ©) is uniform over [—7,7). 


With the aid of Proposition 24.2.5 we can now give alternative characterizations 
of circular symmetry. 


3Here x mod [—7,7) is the unique € € [—7,7) such that x — € is an integer multiple of 27. 
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Proposition 24.2.6 (Characterizing Circular Symmetry). Let Z be a CRV with 
a density. Then each of the following statements is equivalent to the statement 
that Z is circularly-symmetric: 


(a) The distribution of e'?Z is identical to the distribution of Z, for any deter- 
ministic @ € [—7,7). 
(b) The CRV Z has a radially-symmetric density function, i.e., a density fz(-) 


whose value at z depends on z only via its modulus |z]|. 


(c) The CRV Z can be written as Z = Re'©, where R > 0 and © are independent 
real random variables and © ~ U ([—7,7)). 


Proof. Statement (a) is the definition of circular symmetry (Definition 24.2.2). 
The proof of (a) = (b) is slightly obtuse because the density of a CRV is not 
unique.* We begin by noting that if Z is of density fz(-), then by (17.34) the 
CRV e'*Z is of density w > fz(e'*w). Thus, if Z = eZ and if Z is of density 
fz(-), then Z is also of density w > fz(e7'?w). Consequently, if Z is circularly- 
symmetric, then for every ¢ € [—7,7) the mapping w +> fz(e7'?w) is a density 
for Z. We can therefore conclude that the mapping 


1 /{* F 
wre — fz(e '?w) dé 
Pi ae 


is also a density for Z, and this function is radially-symmetric. 
The fact that (b) = (c) follows because if we define R to be the magnitude of Z 
and © to be its argument, then Z = Re'®, and 
fro(r,0) =rfz(re") 
=rfz(r) 


— (2xrfz(r)) is 


where the first equality follows from (17.29) and the second from our assumption 
that fz(z) depends on z only via its modulus |z|. The joint density of R,O is thus 
of a product form, thereby indicating that R and O are independent. And it does 
not depend on 6, thus indicating that its marginal © is uniformly distributed. 


We finally show that (c) > (a). To that end we assume that R > 0 and © are 
independent with © being uniformly distributed over [—7,7) and proceed to show 
that Re'® is circularly-symmetric, i.e., that 


Re® 2 RéOt*, be [-7,7). (24.7) 
To prove (24.7) we note that 
ci(O+d) — ei((O+9) mod [—1,7)) 


=? (24.8) 


4And not all the functions that are densities for a given circularly-symmetric CRV Z are 
radially-symmetric. The radial symmetry can be broken on a set of Lebesgue measure zero. We 


can therefore only claim that there exists “a” radially-symmetric density function for Z. 
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where the first equality follows from the periodicity of the complex exponentials, 
and where the equality in distribution follows from Proposition 24.2.5 because 
0 ~U([-a,7)). The proof is now completed by noting that (24.7) follows from 
(24.8) and from the independence of R and O. (If X is independent of Y, if X is 


independent of Z, and if Y = Z, then (X,Y) = (X,Z) and hence XY = XZ.) 


Example 24.2.7. Let the CRV Z be given by Z = e!®, where ® ~ U ({-7,7)). 
Then Z is uniformly distributed over the unit circle {z: |z| = 1} and is circularly- 
symmetric. It does not have a density. 


24.2.3 Properness and Circular Symmetry 


Proposition 24.2.8. Every finite-variance circularly-symmetric CRV is proper. 


Proof. Let Z be a finite-variance circularly-symmetric CRV. By Note 24.2.3 it 
follows that E[Z] = 0. To conclude the proof it remains to show that E[Z?] = 0. 
To this end we note that 


E[Z?] = ee | (c#Z)"| 
=e ?%E[Z?], ¢€[-n7,7), (24.9) 


where the first equality follows by rewriting Z? as e~!2¢ (ci#Z)?, and where the 
second equality follows because the circular symmetry of Z guarantees that Z 
and e!?Z have the same law, so the expectation of their squares must be equal. 
But (24.9) cannot be satisfied for all ¢ € [—7, 7) (or for that matter for any ¢ such 
that e'?¢ # 1) unless E[Z?] = 0. 


Note 24.2.9. Not every proper CRV is circularly-symmetric. 


Proof. Consider the CRV Z that takes on the four values 1 +i, 1 — i, —1 +i, and 
—1-—i equiprobably. Its real and imaginary parts are independent, each taking on 
the values +1 equiprobably. Computing E[Z] and E [Z 2 we find that they are both 
zero, so Z is proper. To see that Z is not circularly-symmetric consider the random 
variable e'7/4Z. Its distribution is different from the distribution of Z because Z 
takes value in the set {1 + i,-1+i,1—i,—1— i}, and e'*/4Z takes value in the 
rotated set {/2, —V2, V2i, —v2i}. 


The fact that not every proper CRV is circularly-symmetric is not surprising be- 
cause whether a CRV is proper or not is determined solely by its mean and by the 
covariance matrix of its real and imaginary parts, whereas circular symmetry has 
to do with the entire distribution. 


24.2.4 Complex Gaussians 


The definition of a complex Gaussian builds on the definition of a real Gaussian 
vector (Definition 23.1.1). 
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Definition 24.2.10 (Complex Gaussian). A complex Gaussian is a CRV whose 
real and imaginary parts are jointly Gaussian real random variables. A centered 
complex Gaussian is a complex Gaussian of zero mean. 


An example of a complex Gaussian is the standard complex Gaussian, which we 
encountered in Section 24.2.1. 


The class of complex Gaussians is closed under multiplication by deterministic 
complex numbers. Thus, if Z is a complex Gaussian and if a € C is deterministic, 
then @Z is also a complex Gaussian. Indeed, 


Re(aZ)\ — (Re(a) —Im(a)\ (Re(Z) 

Im(aZ)} ~~ \Im(a) — Re(a) Im(Z) } ’ 
so the claim follows from the fact that multiplying a real Gaussian vector by a 
deterministic real matrix results in a real Gaussian vector (Proposition 23.6.3). 
We leave it to the reader to verify that, more generally, if Z is a complex Gaussian 
and if a, 3 € C are deterministic, then aZ+ GZ* is also a complex Gaussian. (This 
is a special case of Proposition 24.3.9 ahead.) 
Not every centered complex Gaussian can be expressed as the scaling of a standard 


complex Gaussian by some complex number. But the following result characterizes 
those that can: 


Proposition 24.2.11. 


(i) For every centered complex Gaussian Z we can find coefficients a, 3 € C so 
that 
Z 2 aW + BW", (24.10) 


where W is a standard complez Gaussian. 


(it) A centered complex Gaussian Z is proper if, and only if, there exists some 
a €C such that Z = aW, where W is a standard complex Gaussian. 


Proof. We begin with Part (i). First note that since Z is a complex Gaussian, its 
real and imaginary parts are jointly Gaussian, and it follows from Corollary 23.6.13 
that there exist deterministic real numbers a@), a@-2), a?) a(?:2) such that 


Re(Z) g qgQ) q.2) WwW, 
ee = (Sey q (2:2) We ’ (24.11) 


where W, and W, are independent real standard Gaussians. Next note that by 
direct computation 


. Re(a)+Re(B) Im(3)—Im(a) 
Re(aW + 6W*)\ _ ae Za V2Re(W) (24.12) 
Im(aW Je BW*) rot) sok) eee 4/2 Im(W) . : 


Since, by the definition of a standard complex Gaussian W, 


oe a 
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it follows from (24.11), (24.12), and (24.13) that if @ and ( are chosen so that 


Re(a)+Re(8)  Im()—Im(a) 
Va Va 7 qQh!) q.2) 
Im(@)tIm(a) —Re(a)-Re(a) | = (421) (2.2) } 
V2 V2 


2 


ie., if 


Bases ((aen +a) + i(a@D — aft2))), 


Gan ((a — a) + (a2 4 al?))), 


vee Re(Z Re(aw w* 
Cee re 


and (24.10) is satisfied. 


We next turn to Part (ii). One direction is straightforward: if Z = aW, then Z 
must be proper because from (24.3) it follows that E[aW] = aE[W] = 0 and 
E|(aW)?] = a? E|W?] =0. 

We next prove the other direction that if Z is a proper complex Gaussian, then 
Z = aW for some a € C and some standard complex Gaussian W. Let Z be a 
proper complex Gaussian. By Part (i) it follows that there exist a, 3 € C such that 
(24.10) is satisfied. Consequently, for this choice of w and 3 we have 


C= E25] 
=E[(aW + BW*)?] 
= o7E[W?] + 208E[WW"] + @E[(W*)?] 
= 2a8, 


where the first equality follows because Z is proper; the second because a and 3 
have been chosen so that (24.10) holds; the third by opening the brackets and using 
the linearity of expectation; and the fourth by (24.3) and (24.2). It follows that 
either a or 3 must be zero. Since W = W*, there is no loss in generality in assuming 
that @ = 0, thus establishing the existence of a € C such that Z = aw. 


By Proposition 24.2.11 (ii) we conclude that if Z is a proper complex Gaussian, then 
Z = aW for some a € C and some standard complex Gaussian W. Consequently, 
the density of such a CRV Z (that is not deterministically zero) is given by 


z/a 
fle) = tel 
1 _ ke 
= e le? zEC 
tla? v] ? 


where the first equality follows from the way the density of a CRV behaves under 
linear transformations (Theorem 17.3.7 or Lemma 17.4.6), and where the second 
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equality follows from (24.1). We thus conclude that if Z is a proper complex Gaus- 
sian, then its density is radially-symmetric, and Z must be circularly-symmetric. 
The reverse is also true: since every complex Gaussian is of finite variance, and since 
every finite-variance circularly-symmetric CRV is also proper (Proposition 24.2.8), 
we conclude that every circularly-symmetric complex Gaussian is proper. Thus: 


Proposition 24.2.12. A complex Gaussian is circularly-symmetric if, and only if, 
at 1s proper. 


The picture that thus emerges is the following. 


(i) Every finite-variance circularly-symmetric CRV is proper. 
(ii) Some proper CRVs are not circularly symmetric. 
(iii) A Gaussian CRV is circularly-symmetric, if and only if, it is proper. 


We shall soon see that these observations extend to vectors too. In fact, the reader 
is encouraged to consult Figure 24.2 on Page 508, which holds also for CRVs. 


24.3 Vectors 


24.3.1 Standard Complex Gaussian Vectors 


Definition 24.3.1 (Standard Complex Gaussian Vector). A standard complex 
Gaussian vector is a complex random vector whose components are IID and each 
of them is a standard complex Gaussian random variable. 


If W is a standard complex Gaussian n-vector, then, by the independence of its n 
components and by (24.1), its density is given by 


fw) = eo" 


~ we (24.14) 
By the independence of its components and by (24.3) 
E[W]=0 and E[WW'] =0. (24.15) 


Thus, every standard complex Gaussian vector is proper (Section 17.4.2). By the 
independence of the components and by (24.2) it also follows that 


E[(WWw'] =|, (24.16) 


where we remind the reader that |, denotes the n x n identity matrix. 


24.3.2 Circularly-Symmetric Complex Random Vectors 


Definition 24.3.2 (Circularly- Symmetric Complex Random Vectors). We say that 
the complex random vector Z is circularly-symmetric if for every ¢ € [—1,7) 
the law of e'?Z is identical to the law of Z. 
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An equivalent definition can be given in terms of linear functionals: 


Proposition 24.3.3 (Circular Symmetry and Linear Functionals). Each of the fol- 
lowing statements is equivalent to the statement that the complex random n-vector Z 
is circularly-symmetric. 


(a) For every ¢ € [—1,7) the law of the complex random vector e'?Z is the same 
as the law of Z: 
e?Z=Z, b€([-n,7). (24.17) 


(b) For every deterministic vector a € C”, the CRV a'Z is circularly-symmetric: 


ealZZa'Z, (aeC", 6€[-1,7)). (24.18) 


Proof. Statement (a) is just the definition of circular symmetry. We next show 
that the two statements (a) and (b) are equivalent. We begin by proving that (a) 
implies (b). This is the easy part because applying the same linear functional to 
two random vectors that have the same law results in random variables that have 
the same law. Consequently, (24.17) implies (24.18). 


We now prove that (b) implies (a). We thus assume (24.18) and set out to prove 
(24.17). By Theorem 17.4.4 it follows that to establish (24.17) it suffices to show 
that the random vectors on the RHS and LHS of (24.17) have the same character- 
istic function, i.e., that 


Een el? ”) — cle eee) WE Cc”. (24.19) 


But this readily follows from (24.18) because upon substituting aw! for a! in 


(24.18) we obtain that 


mZ2ate?Z, we", 
and this implies (24.19), because if Z; = Z», then E[g(Z1)] = Elg(Z2)] for any 
measurable function g and, in particular, for the function g: € + eRe), 


The following proposition demonstrates that circular symmetry is preserved by 
linear transformations. 


Proposition 24.3.4 (Circular Symmetry and Linear Transformations). Let Z be a 
circularly-symmetric complez random n-vector and let A be a deterministic complex 
mxn matric. Then the complex random m-vector AZ is also circularly-symmetric. 


Proof. By Proposition 24.3.3 it follows that to establish that AZ is circularly- 
symmetric it suffices to show that for every deterministic a € C™ the random 
variable a@' AZ is circularly-symmetric. To show this, fix some arbitrary a € C™. 
Because Z is circularly-symmetric, it follows from Proposition 24.3.3 that for every 
deterministic vector @ € C”, the random variable G'Z is circularly-symmetric. 
Choosing 3 = A'a@ establishes that a'AZ is circularly-symmetric. 
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24.3.3 Proper vs. Circularly-Symmetric Vectors 


We now extend the relationship between properness and circular symmetry to 
vectors: 


Proposition 24.3.5 (Circular Symmetry Implies Properness). 


(i) Every finite-variance circularly-symmetric random vector is proper. 


(ti) Some proper random vectors are not circularly-symmetric. 


Proof. Part (ii) requires no proof because a CRV can be viewed as a complex 
random vector taking value in C!, and we have already seen in Section 24.2.3 an 
example of a CRV which is proper but not circularly-symmetric (Note 24.2.9). 


We now prove Part (i). Let Z be a finite-variance circularly-symmetric random 
n-vector. To establish that Z is proper we will show that for every a € C” the 
CRV a'Z is proper (Proposition 17.4.2). To this end, fix an arbitrary a € C”. 
By Proposition 24.3.3 it follows that the CRV a'Z is circularly-symmetric. And 
because Z is of finite variance, so is a'Z. Being a circularly-symmetric CRV of 
finite variance, it follows from Section 24.2.3 that a'Z must be proper. 


24.3.4 Complex Gaussian Vectors 


Definition 24.3.6 (Complex Gaussian Vectors). A complex random n-vector Z is 
said to be a complex Gaussian vector if the real random 2n-vector 


(Re(Z™),...,Re(Z),Im(Z™),... JIm(z))" (24.20) 


consisting of the real and imaginary parts of its components is a real Gaussian 
vector. A centered complex Gaussian vector is a zero-mean complex Gaussian 
vector. 


Note that, Theorem 23.6.7 notwithstanding, the distribution of a centered complex 
Gaussian vector is not uniquely specified by its covariance matrix. It is uniquely 
specified by the covariance matrix if the Gaussian vector is additionally known to 
be proper. This is a direct consequence of the following proposition. 


Proposition 24.3.7. The distribution of a centered complex Gaussian vector Z is 
uniquely specified by the matrices 


K=E[ZZ'| and L=E[ZZ"]. 


Proof. Let R be the real 2n-vector that results from stacking the real part of Z on 
top of its imaginary part as in (24.20). We will prove the proposition by showing 
that the matrices K and L uniquely specify the distribution of R. 

Since Z is a complex Gaussian n-vector, R is a real Gaussian 2n-vector. Since Z is 


of zero mean, so is R. Consequently, the distribution of R is fully characterized by 
its covariance matrix E[RR™] (Theorem 23.6.7). The proof will thus be concluded 


24.3 Vectors 505 


once we show that the matrices L and K determine the covariance matrix of R. 
Indeed, as we next verify, 


1 Gir ree (24.21) 


E[RR'] = 2 \Im(L) + Im(K)  Re(K) — Re(L) 


To verify (24.21) one needs to compute each of the block entries separately. We 


shall see how this is done by computing the top-right entry. The rest of the entries 
are left for the reader to verify. 


e[re(z) mz)" = (A) (255)) 
fe 22 


1 (£221 —E[Z*zi] eE[zzt) — 2") 


2 2i 2i 
1 
= (Im(L) — Im(K)). 


Corollary 24.3.8. The distribution of a proper complex Gaussian vector is uniquely 
specified by its covariance matria. 


Proof. Follows from Proposition 24.3.7 by noting that by specifying that a complex 
Gaussian is proper we are specifying that the matrix L is zero (Definition 17.4.1). 


Proposition 24.3.9 (Linear Transformations of Complex Gaussians). Jf Z is a 
compler Gaussian n-vector and if A and B are deterministic m x n complex ma- 
trices, then the m-vector 


AZ + BZ* 


is a complex Gaussian. 


Proof. Define the complex random m-vector C £ AZ + BZ*. To prove that C is 
Gaussian we recall that linearly transforming a real Gaussian vector yields a real 
Gaussian vector (Proposition 23.6.3), and we note that the real random 2m-vector 
whose components are the real and imaginary parts of C can be expressed as the 
result of applying a linear transformation to the real Gaussian 2n-vector whose 
components are the real and imaginary parts of the components of Z: 


(inG)) = (ita) mB) Reta) — Rete) Imntzy) 


Proposition 24.3.10 (Characterizing Complex Gaussian Vectors). Each of the 
following statements is equivalent to the statement that Z is a complex Gaussian 
n-vector. 


(a) The real random vector whose 2n components correspond to the real and 
imaginary parts of Z is a real Gaussian vector. 
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(b) For every deterministic vector a € C", the CRV a'Z is a complex Gaussian 
random variable. 


(c) There exist complex n x m matrices A and B and a vector 4 € C” such that 
Z=AW+BW* +p 


for some standard complex Gaussian random m-vector W. 


Proof. Statement (a) is just the definition of a Gaussian complex random vector. 


We next prove the equivalence of (a) and (b). That (a) implies (b) follows from 
Proposition 24.3.9 (by substituting a’ for A and 0 for B). 


To prove that (b) = (a) it suffices (by Definition 24.3.6 and Theorem 23.6.17) to 
show that (b) implies that any real linear functional of the real random 2n-vector 
comprising the real and imaginary parts of Z is a real Gaussian random variable, 
i.e., that for every choice of the real constants a,...,a™ and BM,...,38™ the 
random variable 


Sa) Re(Z) + S° BO Im(Z) (24.22) 
j=l j=l 
is a Gaussian real random variable. To that end we rewrite (24.22) as 
S> a Re(Z) + S* 8M Im(Z) = aT Re(Z) + BT Im(Z) (24.23) 
j=l j=l 
= Re((a — iB)'Z), (24.24) 
where we define the real vectors a and B as a £ (A cnc 50h)" € R” and 


B = (B%,..., 8)" © R”. Now (b) implies that (a — i8)'Z is a Gaussian 
complex random variable, so its real part Re((a@ — i3)'Z) must be a real Gaus- 
sian random variable (Definition 24.2.10 and Proposition 23.6.6), thus establishing 
that (b) implies that (24.22) is a real Gaussian random variable. 


We next turn to proving the equivalence of (a) and (c). That (c) implies (a) follows 
directly from Proposition 24.3.9 applied to the Gaussian vector W. The proof of 
the implication (a) = (c) is very similar to the proof of its scalar version (24.10). 
We first note that since we can choose p = E[Z], it suffices to prove the result for 
the centered case. Now (a) implies that there exist n x n matrices D,E,F,G such 


that (Z) ae 

Re Zz 1 

a) = ( ) ee (2h) 
where W, and Wp are independent real standard Gaussian n-vectors (Defini- 
tion 23.1.1). On the other hand 


ae a aa) 2 (aces nats eae 


a 
Im(AW aie, BW*) iB} im) Reta} Re) a : (24.26) 


& 


If W is a standard complex Gaussian, then 


(aimew) = (w,): 
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where W, and W» are as above. Consequently, the representations (24.25) and 


(24.26) agree if 
Dee Re(A)+Re(B) —Im(B)—Im(A) 
= V2 V2 
€ 6) — \ Im(B)+Im(A) — Re(A)—Re(B) | ? 


v2 v2 


i.e., if we set 


A= —_((D+G)+i(F—)), 


HS] 
bo 


B= —_((D-G)+i(F+E)). 


S 


24.3.5 Proper Complex Gaussian Vectors 


A proper complex Gaussian vector is a complex Gaussian vector that is also proper 
(Definition 17.4.1). Thus, Z is a proper complex Gaussian vector if it is a centered 
complex Gaussian vector satisfying E [ZZ"| = 0. 


Recall that, by Proposition 24.3.5, every finite-variance circularly-symmetric com- 
plex random vector is also proper, but that some random vectors are proper and not 
circularly-symmetric. We next show that for Gaussian vectors, circular symmetry 
is equivalent to properness. 


Proposition 24.3.11 (For Complex Gaussians, Proper = Circularly-Symmetric). 
A complex Gaussian vector is proper if, and only if, it is circularly-symmetric. 


Proof. Every circularly-symmetric complex Gaussian is proper, because every com- 
plex Gaussian is of finite-variance, and every finite-variance circularly-symmetric 
complex random vector is proper (Proposition 24.3.5). 


We now turn to the reverse implication, i.e., that if a complex Gaussian vector 
is proper, then it is circularly-symmetric. Assume that Z is a proper Gaussian 
n-vector. We will prove that Z is circularly-symmetric using Proposition 24.3.3 by 
showing that for every deterministic vector a € C” the random variable a'Z is 
circularly-symmetric. 


To that end, fix some arbitrary a € C”. Since Z is a Gaussian vector, it follows that 
a'Z is a Gaussian CRV (Proposition 24.3.9 with the substitution of a! for A and 
0 for B). Moreover, since Z is proper, so is a'Z (Proposition 17.4.2). We have thus 
established that a'Z is a proper Gaussian CRV and hence, by Proposition 24.2.12, 
also circularly-symmetric. 


The relationship between circular symmetry, properness, and Gaussianity is illus- 
trated in Figure 24.2. 


We next address the existence of a proper complex Gaussian of a given covariance 
matrix. We first recall that we say that a complex n x n matrix K is complex 
positive semidefinite and write K = 0 if a’Ka is a nonnegative real number for 
every a € C”. Recall also that an n x n complex matrix K is a complex positive 
definite matrix if, and only if, there exists a complex n x n matrix S such that 
K = SS!; see (Axler, 1997, Chapter 7, Theorem 7.27). 
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Figure 24.2: The relationship between circular symmetry, Gaussianity, and proper- 
ness. The outer region corresponds to all complex random vectors. Within that is 
the set of all vectors whose components are of finite variance. Within it is the family 
of all proper random vectors. The slanted lines indicate the circularly-symmetric 
vectors, and the gray area corresponds to the Gaussian vectors. The same relations 
hold for scalars and for stochastic processes. 


Proposition 24.3.12. 


(i) Given any nxn complex positive semidefinite matrix K, there exists a proper 
complez Gaussian n-vector whose covariance matrix is K. 


(it) The distribution of a proper Gaussian complex vector is fully specified by its 
covariance matrix. 


(iti) If Z is a proper complex Gaussian n-vector of nonsingular covariance matriz 
K, then its density is given by: 


_ 1 —zik-lg n 
fz(z) = Sra , zeEC”. (24.27) 


Note 24.3.13. We denote the distribution of a proper Gaussian complex vector of 
covariance matrix K by 


Nc(0, Kk). 


Proof. To prove (i) we note that since K is positive semidefinite, it follows that 
there exists an n x n matrix S such that 


K = SST. (24.28) 
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Consider now the vector 
Z= SW, (24.29) 


where W is a standard complex Gaussian n-vector. We will show that Z has the 
desired properties. First, it must be Gaussian because it is the result of applying 
a deterministic linear mapping to the Gaussian vector W (Proposition 24.3.9). It 
is centered because W is centered (24.15) and because E[SW] = SE[W]. It is 
proper because it is the result of linearly transforming the proper complex random 
vector W (Proposition 17.4.3 and (24.15)). Finally, its covariance matrix is 


E[(SW)(SW)'] = E[Sww's'] 
= SE[Ww'| si 
= 5,5! 
=K. 


Part (ii) was proved in Corollary 24.3.8. 


To prove (iii) we use (24.29) & (24.28) along with the change of variables formula 
(Lemma 17.4.6) and the density of a standard Gaussian complex random vector 
(24.14) to obtain 


1 ae 
fz(z) = act Spi *z) 
_~ 1 g-gn ts" 
a” det (SS) 
1 = 
7 ZZ 


24.4 Exercises 


Exercise 24.1 (The Complex Conjugate of a Circularly-Symmetric CRV). Must the com- 
plex conjugate of a circularly-symmetric CRV be circularly-symmetric? 


Exercise 24.2 (Scaled Circularly-Symmetric CRV). Show that if Z is circularly-symmetric 
and if a € C is deterministic, then the distribution of aZ depends on a only via its 
magnitude |a|. 


Exercise 24.3 (The n-th Power of a Circularly-Symmetric CRV). Show that if Z is a 
circularly-symmetric CRV and if n is a positive integer, then Z” is circularly-symmetric. 


Exercise 24.4 (The Characteristic Function of Circularly-Symmetric CRVs). Show that a 
CRV Z is circularly-symmetric if, and only if, its characteristic function ®z(-) is radially- 
symmetric in the sense that ®z(@w) depends on @w only via its magnitude |]. 


Exercise 24.5 (Multiplying Independent CRVs). Show that the product of two indepen- 
dent complex random variables is circularly-symmetric whenever (at least) one of them 
is circularly-symmetric. 
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Exercise 24.6 (The Complex Conjugate of a Gaussian CRV). Must the complex conjugate 
of a Gaussian CRV be Gaussian? 


Exercise 24.7 (Independent Components). Show that if the complex random variables 
W and Z are circularly-symmetric and independent, then the random vector (W, Z ye is 
circularly-symmetric. 


Exercise 24.8 (The Characteristic Function of a Proper Complex Gaussian Vector). 
Compute the characteristic function of a proper complex Gaussian vector of covariance 
matrix K. 


Exercise 24.9 (Jointly Circularly-Symmetric Complex Gaussians). As in Definition 23.7.1, 
we can also define jointly complex Gaussians and jointly circularly-symmetric complex 
Gaussians. Extend the results of Section 23.7 by showing: 


(i) Two centered jointly complex Gaussian vectors Z; and Zz are independent if, and 
only if, they satisfy 
E[Z:Z!] =0 and E[Z:Z3] =0. 


(ii) Two jointly circularly-symmetric complex Gaussian vectors Z; and Zz are indepen- 
dent if, and only if, they satisfy 


E[Z:Z)] =0. 


(iii) If Zi, Ze are centered jointly complex Gaussians, then, conditional on Z2 = Ze, the 
complex random vector Z; is a complex Gaussian such that 


E|(Za — E[Z1|Z2 = z2]) (Zi — E[Z1| Zo = z2])' | Z2= z2| 


and 
E|(Za — E[Z1|Z2 = z2]) (Zi — E[Z1 | Ze = za)" | Z2= z2| 


do not depend on z2 and such that the conditional mean E[Zi|Z2 = z2] can be 
expressed as Az2 + Bz5 for some matrices A and B that do not depend on Za. 


(iv) If Z1, Ze are jointly circularly-symmetric complex Gaussians, then, conditional on 
Z2 = Z2, the complex random vector Z; is a circularly-symmetric complex Gaussian 
of a covariance matrix that does not depend on zz and of a mean that can be 
expressed as Azo for some matrix A that does not depend on Za. 


Exercise 24.10 (Limits of Complex Gaussians). Extend the definition of almost-sure con- 
vergence (23.71) to complex random vectors, and show that if the complex Gaussian 
d-vectors Z1, Z2,... converge to Z almost surely, then Z must be a complex Gaussian. 


Exercise 24.11 (Limits of Circularly-Symmetric Complex Random Variables). Consider 
a sequence Zj, Z2,... of circularly-symmetric complex random variables that converges 
almost surely to the CRV Z. Show that Z must be circularly-symmetric. Extend this 
result to complex random vectors. 


Hint: Consider the characteristic functions of Z, 21, Z2,..., and recall the proof of Theo- 
rem 19.9.1. 
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Exercise 24.12 (Limits of Circularly-Symmetric Complex Gaussians). Let 271, Z2,... be 
a sequence of circularly-symmetric complex Gaussians that converges almost surely to 
the CRV Z. Show that Z must be a circularly-symmetric Gaussian. Extend to complex 
random vectors. 


Hint: Either combine Exercises 24.10 & 24.11 or prove directly using the characteristic 
function as in the proof of Theorem 19.9.1. 


Chapter 25 


Continuous- Time Stochastic Processes 


25.1 Notation 


Recall from Section 12.2 that a continuous-time stochastic process (X(t), t € R) 
is a family of random variables that are defined on a common probability space 
(Q,F,P) and that are indexed by the real line (time). We denote by X(t) the 
time-t sample of (X(2), te R), i.e., the random variable to which t¢ is mapped 
(the RV indexed by t). This RV is sometimes also called the state at time t. 
Rather than writing (X(t), t € R), we sometimes denote the SP by (X(¢)) or 
by X. Perhaps the clearest way to denote the process is as a mapping: 


X:QxR—-R, (w,t)r> X(w,t). 


For a fixed ¢t € R, the time-t sample X(t) is the mapping X(-,t) from Q to the real 
line, ie., the RV wt> X(w,t) indexed by t. If we fix w € 2 and view X(w,-) asa 
mapping t+> X(w,t), then we obtain a function of time. This function is called a 
trajectory, sample-path, path, sample-function, or realization. 


wt» X(w,t) time-t sample for a fixed t¢€ R (random variable) 


tts X(w,t) trajectory for a fixed w € 2 (function of time) 


Recall also from Section 12.2 that the process is centered if for every t € R the 
RV X(t) is of zero mean. It is of finite variance if for every t € R the RV X(t) 
is of finite variance. 


25.2 The Finite-Dimensional Distributions 


The finite-dimensional distributions (FDDs) of a continuous-time SP is the family 
of all joint distributions of n-tuples of the form (X(t1),...,X(tn)), where n can 
be any positive integer and t),...,t, © R are arbitrary epochs. To specify the 
FDDs of a SP (X(¢)) one must thus specify for every n € N and for every choice of 
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the epochs t),...,tn € R the distribution of the n-tuple (X(t1),...,X(tn)). This 
is a conceptually clear if formidable task. We denote the cumulative distribution 
function of the n-tuple (X(t1),...,X(tn)) by 


PAG airs eta taagin)| — Pe |X Ga) Ss Se org Xn) Ss Gel 


We next show that the FDDs of every SP (X(t)) must satisfy two key properties: 
the symmetry property and the consistency property. The symmetry property 
is that Fi,(-;-) is unaltered when we simultaneously permute its right arguments 
(the ¢’s) and its left arguments (the €’s) by the same permutation. That is, for 
every n € N; every choice of the epochs t),...,tn € R; every €1,...,€, © R; and 
every permutation 7 on {1,...,n} 


Fi, (En(1)s see bn(nyi tx) see ae — Fi, (1, see sEn3 ti, oe vita): (25.1) 


This property is a generalization to n-tuples of the obvious fact that if X and Y are 
random variables, then Pr[X <2,Y <y] = Pr[Y <y,X <a for every z,y ER. 


The consistency property is that whenever n € N and tj,...,tn,&1,---,€n € R, 


jim | Fa(61-- Gy Gay Hiya tacag th) an Fees (Gives Enna by aus tals 


(25.2) 
This property is a consequence of the fact that the set 


{we 0: X(w,t1) < Gy... X(W,tn-1) < En-1,X(w, tn) < En} 
is increasing in €, and converges as €, tends to infinity to the set 
{w EQ: X(w, ti) Ss f1,- ete ,X(w, tn—1) = Sey. 
The key result on the existence of stochastic processes of given FDDs is Kol- 
mogorov’s Existence Theorem, which states that the symmetry and consistency 


properties suffice for a family of finite-dimensional distributions to correspond to 
the FDDs of some SP. 


Theorem 25.2.1 (Kolmogorov’s Existence Theorem). Let Gi(-;-), Go(-;-), ... be 
a sequence of functions G,: R"” x R” — [0,1] satisfying 


1) that for every n> 1 and every ti,...,tn € R the function Gy(-;t1,...,tn) is 
a valid joint distribution function;' 


2) the symmetry property 


Ga (Ennead e ens tennis been) ) = CulCigtasitattine iste | 
ty,.--,tn,€1,---;€2 € R, m a permutation on {1,...,n}; (25.3) 


1 A function F: R" — [0,1] is a valid joint distribution function if there exist random variables 
X1,...,Xn whose joint distribution function is F(-), i-e., 


Pr[X1 Eigcteg Xn < En] = (Eig bn)s Cisne 35Eni ER. 


Not every function F’: R” — [0,1] is a valid joint distribution function. For example, a valid joint 
distribution function must be monotonic in each variable. See, for example, (Billingsley, 1995, 
Theorem 12.5) for a characterization of joint distribution functions. 
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3) and the consistency property 


lim Gn(&1, oes »En—1)6n3 f1, Sess ye tiety) 


> Gn-1(&1, tee Cn 1: cee »tn—1), 
t1,---,tn,61,---5&, ER. (25.4) 


Then there exists a SP (X(t)) whose FDDs are given by {Gn(-;-)} in the sense 
that 
Pe GV Cig tig Mbp) Spe |) SG (Siw ced Org ay eegte) 


for every n EN, all ty,...,tn € R, and all &,...,€) ER. 
Proof. See, for example, (Billingsley, 1995, Chapter 7, Section 36), (Cramér and 


Leadbetter, 2004, Section 3.3), (Grimmett and Stirzaker, 2001, Section 8.6), or 
(Doob, 1990, Chapter I § 5). 


In the study of n-tuples of random variables we can use the joint distribution 
function to answer, at least in principle, most of our probability questions. When it 
comes to stochastic processes, however, there are interesting questions that cannot 
be answered using the FDDs. For example, it can be shown that the probability 
of the event that the SP (X (t)) produces a sample-path that is continuous at time 
zero cannot be computed from the FDDs. This is not due to our limited analytic 
capabilities but rather because there exist two stochastic processes of identical 
FDDs where for one process this event is of zero probability whereas for the other 
it is of probability one (Cramér and Leadbetter, 2004, Section 3.6). Fortunately, 
most of the questions of interest to us in Digital Communications can be answered 
based on the FDDs. 


An exception is a very subtle point related to measurability. From the FDDs alone 
one cannot determine whether the trajectories are measurable functions of time, 
i.e., whether it makes sense to talk about integrals of the form ee x(w,t) dt. This 
issue will be revisited in Section 25.9. 


The above discussion motivates us to define the set of events whose probability 
can be determined from the FDDs using the axioms of probability, i.e., using the 
rules that the probability of the set of all possible outcomes Q is one and that 
the probability of a countable union of disjoint events is the infinite sum of the 
probabilities of the events. In the mathematical literature what we are defining is 
called the o-algebra generated by (Xx (t), f€ R) or the o-algebra generated 
by the cylindrical sets of (X(t), te R)? For the classical definition see, for 
example, (Billingsley, 1995, Section 36). 


Definition 25.2.2 (o-Algebra Generated by a SP). The o-algebra generated 
by a SP (X(t), t € R) which is defined over the probability space (Q,F,P) is 
the set of events (i.e., elements of F) whose probability can be computed from the 
FDDs of (X(t)) using only the axioms of probability. 


?It is the smallest o-algebra with respect to which all the random variables (X(t), te R) are 
measurable. 
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We now rephrase our previous statement about continuity as saying that the set 
of w € Q for which the function t + X(w,t) is continuous at t = 0 is not in the 
o-algebra generated by (Xx (t)). The probability of such sets cannot be inferred 
from the FDDs alone. If such sets are assigned a probability it must be based on 
some additional information that is not captured by the FDDs. 


The FDDs provide a natural way to define independence between stochastic pro- 
cesses. 


Definition 25.2.3 (Independent Stochastic Processes). Two stochastic processes 
(X(t)) and (Y(t)) defined on the same probability space (Q,F,P) are said to be 
independent stochastic processes if for every n € N and any choice of the 
epochs ty,...,tn € R, the n-tuples (X(t1),...,X(tn)) and (Y(t1),...,¥(tn)) are 
independent. 


25.3 Definition of a Gaussian SP 


By far the most important processes for modeling noise in Digital Communications 
are the Gaussian processes. Fortunately, these processes are among the mathemat- 
ically most tractable. The definition of a Gaussian SP builds on that of a Gaussian 
vector (Definition 23.1.1). 


Definition 25.3.1 (Gaussian Stochastic Processes). A SP (X(t)) is said to be a 
Gaussian stochastic process if for every n © N and every choice of the epochs 
ti,...,tn €R, the random vector (X(t1),...,X(tn))' is Gaussian. 


Note 25.3.2. Gaussian stochastic processes are of finite variance. 


Proof. If (X(t)) is a Gaussian process, then a fortiori at each epoch t € R, the 
random variable X(t) is a univariate Gaussian (choose n = 1 in the above defini- 
tion) and hence, by the definition of the univariate distribution (Definition 19.3.1), 
of finite variance. 


One of the things that make Gaussian processes tractable is the ease with which 
their FDDs can be specified. 


Proposition 25.3.3 (The FDDs of a Gaussian SP). If (X(t)) is a centered Gaus- 
sian SP, then all its FDDs are determined by the mapping that specifies the covari- 
ance between any two of its samples: 


(t1, tz) end Cov|X(t,), X(t2)], t1, te € R. (25.5) 


Proof. Let (X(t)) be a centered Gaussian SP. We shall show that for any choice of 
the epochs t),..., tn € R we can compute the joint distribution of X(t,),...X (tn) 
from the mapping (25.5). To this end we note that since (X(t)) is a Gaussian 
SP, the random vector (X(t1),...X(tn))' is Gaussian (Definition 25.3.1). Conse- 
quently, its distribution is fully specified by its mean vector and covariance matrix 
(Theorem 23.6.7). Its mean vector is zero, because we assumed that (X(t)) is cen- 
tered. To conclude the proof we thus only need to show that the covariance matrix 
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of (X(t1),...X(tn))' is determined by the mapping (25.5). But this is obvious 
because the covariance matrix of (X(ti),...X(tn))' is the n x n matrix 


Cov[X (ti), X(t1)] Cov[X(t1), X(te)] +--+ Cov[X(t1), X(tn)] 


, (25.6) 


Cov[X (tn), X(t1)] Cov[X (tn), X(t2)] --- Cov[X (tn), X(tn)] 


and each of the entries in this matrix is specified by the mapping (25.5). 


Things become even simpler if the Gaussian process is wide-sense stationary 
(Definition 25.4.2 ahead). In this case the RHS of (25.5) is determined by t, — ta, 
so the mapping (25.5) (and hence all the FDDs) is determined by the mapping 
T +> Cov[X(t), X(t+7)]. But before discussing wide-sense stationary Gaussian 
stochastic processes in Section 25.5, we first define stationarity and wide-sense 
stationarity for general processes that are not necessarily Gaussian. 


25.4 Stationary Continuous-Time Processes 


Our treatment of stationary continuous-time processes is similar to the treatment 
of their discrete-time counterparts (Chapter 13). The following is the continuous- 
time analogue of Definition 13.2.1. 


Definition 25.4.1 (Stationary Continuous-Time SP). We say that a continuous- 
time SP (X(t) is stationary (or strict sense stationary, or strongly sta- 
tionary) if for everyn € N, any epochs t1,...,tn € R, and every 7 ER, 


(X(ti+7),...,X (tn +7)) = (X(h),...,X(tn))- (25.7) 


By considering the case where n = 1 we obtain that if (X(t)) is stationary, then 
all its samples have the same distribution 


EA 


X(t)=X(t+7), t,7ER. (25.8) 
That is, the distribution of the random variable X(t) does not depend on t. By 
considering n = 2 we obtain that if (X(t)) is stationary, then the joint distribution 
of any two of its samples depends on how far apart they are and not on the absolute 
time at which they are taken 


(X (ti), X (t2)) = (X(t. +7),X(t2+7)), tr, te,7 ER. (25.9) 


That is, the joint distribution of (X(t), X(t2)) can be computed from tz — t1. 


As we did for discrete-time processes (Definition 13.3.1), we can also define wide- 
sense stationarity of continuous-time processes. Recall that a process (Xx (t)) is 
said to be of finite variance if at every time t € R the random variable X(t) is of 
finite variance. 
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Definition 25.4.2 (Wide-Sense Stationary Continuous-Time SP). A continuous- 
time SP (X(t)) is said to be wide-sense stationary (or weakly stationary or 
second-order stationary) if the following three conditions are met: 


1) It is of finite variance. 
2) Its mean is constant 

E[X(t)] =E[X(t+7)], t,7 ER. (25.10) 
3) The covariance between its samples satisfies 


Cov[X (ti), X(t2)] = Cov[X(t1 +7),X(t2+7)], ti,te,7€R. (25.11) 


By considering the case where t; = tg in (25.11), we obtain that all the samples of 
a WSS SP have the same variance: 


Var |X (t)] = Var[X(0)], teR. (25.12) 
Note 25.4.3. Every finite-variance stationary SP is WSS. 


Proof. This follows because (25.8) implies (25.10), and because (25.9) implies 
(25.11). 


The reverse is not true: some WSS processes are not stationary. (Wide-sense 
stationarity concerns only means and covariances, whereas stationarity has to do 
with distributions.) 


The following definition of the autocovariance function of a continuous-time WSS 
SP is the analogue of Definition 13.5.1. 


Definition 25.4.4 (Autocovariance Function). The autocovariance function 
Kxx: R-—R of a WSS continuous-time SP (X(t) is defined for every T € R by 


Kxx(r) £ Cov[X(t +7), X(0)], (25.13) 
where the RHS does not depend on t because (X(t) is assumed to be WSS. 


By evaluating (25.13) at + = 0 and using (25.12), we can express the variance 
of X(t) in terms of the autocovariance function Kxx as 


Var[X(t)] =Kxx(0), téER. (25.14) 


We end this section with a few simple inequalities related to WSS stochastic pro- 
cesses and their autocovariance functions. 


Lemma 25.4.5. Let (X(t)) be a WSS SP of autocovariance function Kxx. Then 
|Kxx(7)| < Kxx(0), TER, (25.15) 


E[|X(#)I] < /Kxx(0) +E[X(0)?, tER, (25.16) 


and 
E[| x(t) X(t’)|] < Kxx(0) +E[X(0)]’, 4,t/ ER. (25.17) 
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Proof. Inequality (25.15) follows from the Covariance Inequality (Corollary 3.5.2): 
|Kxx(7)| = |Cov[X(t+ 7), X(#)]| 


< /Var[ X(t + 7)]/Var[X (t)] 
= Kxx(0), 


where the last equality follows from (25.14). 


Inequality (25.16) follows from the nonnegativity of the variance of |X(t)| and the 
assumption that (X(t)) is WSS: 


0 < Var] X (t)]] 
= E[X?(t)] - (E[X@I])” 
= Var[X CE ae x(#)])” 
= Kxx(0) + (ELX(0)))” - (ElLX(@1))’. 


Finally, Inequality (25.17) follows from the Cauchy-Schwarz Inequality for random 
variables (Theorem 3.5.1) 


JEUV)| < VEE) 


by substituting |X(t)| for U and |X(t’)| for V and by noting that 


25.5 Stationary Gaussian Stochastic Processes 


For Gaussian stochastic processes we do not distinguish between stationarity and 
wide-sense stationarity. The reason is that, while for general processes the two 
concepts are different (in that every finite-variance stationary SP is WSS, but not 
every WSS SP is stationary), for Gaussian stochastic processes the two concepts are 
equivalent. These relationships between stationarity and wide-sense stationarity for 
general stochastic processes and for Gaussian stochastic processes are illustrated 
in Figure 25.1. 


Proposition 25.5.1 (Stationary Gaussian Stochastic Processes). 


(i) A Gaussian SP is stationary if, and only if, it is WSS. 


(tt) The FDDs of a centered stationary Gaussian SP are fully specified by its 
autocovariance function. 


Proof. We begin by proving (i). One direction has only little to do with Gaus- 
sianity. Since every Gaussian SP is of finite variance (Note 25.3.2), and since every 
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stochastic processes 


Figure 25.1: The relationship between wide-sense stationarity, Gaussianity, and 
strict-sense stationarity. The outer region corresponds to all stochastic processes. 
Within it is the set of all finite-variance processes and within that the set of all wide- 
sense stationary processes. The slanted lines indicate the strict-sense stationary 
processes, and the gray area corresponds to the Gaussian stochastic processes. 


finite-variance stationary SP is WSS (Note 25.4.3), it follows that every stationary 
Gaussian SP is WSS. 


Gaussianity plays a much more important role in the proof of the reverse direction, 
namely, that every WSS Gaussian SP is stationary. We prove this by showing that 
if (X(t) is Gaussian and WSS, then for every n € N and any fj,...,t,,7 € R 
the joint distribution of X(t,),...,X(tn) is identical to the joint distribution of 
X(ty+7T),...,X(tn +7). To this end, let n € N and t),...,tn,7 € R be fixed. 


Because (X(t)) is Gaussian, (X(t1),...,X(tn))’ and (X(t1 +7),...,X(tn +7))" 
are both Gaussian vectors (Definition 25.3.1). And since (X(t)) is WSS, the two 
are of the same mean vector (see (25.10)). The former’s covariance matrix is 
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and the latter’s is 


Cov[X(t) +7), X(ti+7)] -+- Cov[X(t, +7), X (tr +7)] 


Cov[X (tn ie X(ti +7)| | | , Cov[X (tn Ay, X (ty +7) 


Since (X(t)) is WSS, the two covariance matrices are identical (see (25.11)). But 
two Gaussian vectors of equal mean vectors and of equal covariance matrices have 
identical distributions (Theorem 23.6.7), so the distribution of (X(t1),...,X(tn))" 
is identical to that of (X(ti1+7),...,X(tn+7))". Since this has been established for 
all choices of n € N and all choices of t1,...,tn,7 € R, the SP (X(t)) is stationary. 
Part (ii) follows from Proposition 25.3.3 and the definition of wide-sense stationar- 
ity. Indeed, by Proposition 25.3.3, all the FDDs of a centered Gaussian SP (Xx (t)) 
are determined by the mapping (25.5). If (X(t)) is additionally WSS, then the 
RHS of (25.5) can be computed from t, — tz and is given by Kxx(ti — tg), so the 
mapping (25.5) is fully specified by the autocovariance function Kxx. 


25.6 Properties of the Autocovariance Function 


Many of the definitions and results on continuous-time WSS stochastic processes 
have analogous discrete-time counterparts. But some technical issues are encoun- 
tered only in continuous time. For example, most results on continuous-time WSS 
stochastic processes require that the autocovariance function of the process be 
continuous at the origin, i.e., satisfy 


and this condition has no discrete-time counterpart. As we next show, this condi- 
tion is equivalent to the condition 


3 2 
lim (xt +6) — X(t) =0, teR. (25.19) 
This equivalence follows from the identity 
E (xo —~X(t+ 6))"| = 2(Kxx(0) —Kxx(d)), t6ER, (25.20) 


which can be proved as follows. We first note that it suffices to prove it for centered 
processes, and for such processes we then compute: 
E|(X() — X(t+6))”] = E[X(#) - 2x (1) x(t +6) + X°¢+8)] 
= E[X°(t)] — 2E[X(t) x(t + 9)] + E[X°(t + 9)] 
= Kxx(0) — 2Kxx(6) + Kxx(0) 
= 2(Kxx (0) — Kxx(6)), 
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where the first equality follows by opening the square; the second by the linearity of 
expectation; the third by the definition of Kxx; and the final equality by collecting 
terms. 


We note here that if the autocovariance function of a WSS process is continuous 
at the origin, then it is continuous everywhere. In fact, it is uniformly continuous: 


Lemma 25.6.1. Jf the autocovariance function of a WSS continuous-time SP is 
continuous at the origin, then it is a uniformly continuous function. 


Proof. We first note that it suffices to prove the lemma for centered processes. 
Let (X(t)) be such a process. For every 7,5 € R we then have 


|Kxx(7 + 6) — Kxx(r)| = |E[X(7 + 6) X(0)] — E[X(r) x (0)]| 
= |E[(X(r +6) — X(r)) x(0)]| 
= |Cov[X(r + 6) — X(r), X(0)]| 


< E(x +8) — x(0)"] VERO) 
= /2(Kxx (0) — Kxx(6)) VKxx (0) 


= /2Kxx(0)(Kxx (0) — Kxx(6)), (25.21) 


where the equality in the first line follows from the definition of the autocovariance 
function because (X(t)) is centered; the equality in the second line by the linearity 
of expectation; the equality in the third line by the definition of the covariance 
between two zero-mean random variables; the inequality in the fourth line by the 
Covariance Inequality (Corollary 3.5.2); the equality in the fifth line by (25.20); 
and the final equality by trivial algebra. The uniform continuity of Kxx now 
follows from (25.21) by noting that its RHS does not depend on 7 and that, by our 
assumption about the continuity of Kyy at zero, it tends to zero as 6 — 0. 


We next derive two important properties of autocovariance functions and then 
demonstrate in Theorem 25.6.2 that these properties characterize those functions 
that can arise as the autocovariance functions of a WSS SP. These properties are 
the continuous-time analogues of (13.12) & (13.13), and the proofs are almost 
identical. We first state the properties and then proceed to prove them. 


The first property is that the autocovariance function Kxx of any continuous-time 
WSS process (X(t)) is a symmetric function 


Kxx(—T) = Kxx(T), TER. (25.22) 


The second is that it is a positive definite function in the sense that for every 
n €N, and for every choice of the coefficients a1,...,@, € R and of the epochs 
Uist bne R 


3 SS Apap’ Kxx (ty = ty’) > 0. (25.23) 


v=1v’/=1 
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To prove (25.22) we calculate 


Kxx(r) = Cov[X(t + 7), X(#)] 
= Cov[X(t'), X(t’ — 7)] 
= Cov[X(t’ — 7), X(¢')] 
=Kxx(-T), TER, 
where the first equality follows from the definition of Kxx(rT) (25.13); the second 


by defining t’ = t+7; the third because Cov[X,Y] = Cov[Y, X] (for real random 
variables); and the final equality by the definition of Kxx(—r) (25.13). 


To prove (25.23) we compute 


y 5 yay Kxx (ty — ty => 3 aya, Cov[X (t,), X (tr) 


v=1v/=1 v=1p/=1 
= ae ayX(ty), S> av X(t) 
v=1 vyi=1 


= vero a X(t)| (25.24) 


> 0. 
The next theorem demonstrates that Properties (25.22) and (25.23) characterize 
the autocovariance functions of WSS stochastic processes (cf. Theorem 13.5.2). 


Theorem 25.6.2. Every symmetric positive definite function is the autocovariance 
function of some stationary Gaussian SP. 


Proof. The proof is based on Kolmogorov’s Existence Theorem (Theorem 25.2.1) 
and is only sketched here. Let K(-) be a symmetric and positive definite function 
from R to R. The idea is to consider for every n € N and for every choice of the 
epochs t1,...,¢n € R the joint distribution function G’,(-; t1,...,tn) corresponding 
to the centered multivariate Gaussian distribution of covariance matrix 


K(t: —t1) K(ti—t2) --» K(tr — tn) 


Ke). Re =o + KGB) 


and to verify that the sequence {G,(-;-)} satisfies the symmetry and consistency 
requirements of Kolmogorov’s Existence Theorem. The details, which can be found 
in (Doob, 1990, Chapter II, Section § 3, Theorem 3.1), are omitted. 


25.7 The Power Spectral Density of a Continuous-Time SP 


Under suitable conditions, engineers usually define the power spectral density of a 
WSS SP as the Fourier Transform of its autocovariance function. There is nothing 
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wrong with this definition, and we encourage the reader to think about the PSD 
in this way.? We, however, prefer a slightly more general definition that allows us 
also to consider discontinuous spectra and, more importantly, allows us to infer 
that any integrable, nonnegative, symmetric function is the PSD of some Gaus- 
sian SP (Proposition 25.7.3). Fortunately, the two definitions agree whenever the 
autocovariance function is continuous and integrable. 


Before defining the PSD, we pause to discuss the Fourier Transform of the auto- 
covariance. If the autocovariance function Kxx of a WSS SP (X(t)) is integrable, 
ice., if 


co 
if |Kxx(r)| dr < 00, (25.25) 
—Cco 
then we can discuss its FT Kxx. The following proposition summarizes the main 
properties of the FT of continuous integrable autocovariance functions. 


Proposition 25.7.1. Jf the autocovariance function Kxx is continuous at the origin 
and integrable, then its Fourier Transform Kxx is nonnegative 


Kxx(f)>0, feR (25.26) 


and symmetric 


Kxx(—f) =Kxx(f), feR. (25.27) 
Moreover, the Inverse Fourier Transform recovers Kxx in the sense that* 


Kxx(T) = i Kxx(fje?"!" df, reR. (25.28) 


—co 


Proof. This result can be deduced from three results in (Feller, 1971, Chap- 
ter XIX): the theorem in Section 3, Bochner’s Theorem in Section 2, and Lemma 2 
in Section 2. 


Definition 25.7.2 (The PSD of a Continuous-Time WSS SP). We say that the 
WSS continuous-time SP (X(t)) is of power spectral density (PSD) Sxx if Sxx 
is a nonnegative, symmetric, integrable function from R to R whose Inverse Fourier 
Transform is the autocovariance function Kxx of (X(t)): 


Kxx(T) = / 7 Sxx(f)e?"J7 df, TER. (25.29) 


A few remarks regarding this definition: 


3Engineers can, however, be a bit sloppy in that they sometimes speak of a SP whose PSD 
is discontinuous, e.g., the Brickwall function f +> I{|f| << Wy}. This is inconsistent with their 
definition because the FT of an integrable function must be continuous (Theorem 6.2.11), and 
consequently if the autocovariance function is integrable then its FT cannot be discontinuous. 
Our more general definition does not suffer from this problem and allows for discontinuous PSDs. 

*Recall that without additional assumptions one is not guaranteed that the Inverse Fourier 
Transform of the Fourier Transform of a function will be identical to the original function. Here we 
need not make any additional assumptions because we already assumed that the autocovariance 
function is continuous and because autocovariance functions are positive definite. 
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(i) By the uniqueness of the IFT (the analogue of Theorem 6.2.12 for the IFT) it 
follows that if two functions are PSDs of the same WSS SP, then they must be 
equal except on a set of frequencies of Lebesgue measure zero. Consequently, 
we shall often speak of “the” PSD as though it were unique. 


(ii) By Proposition 25.7.1, if Kxx is continuous and integrable, then (X(t)) has 

a PSD in the sense of Definition 25.7.2, and this PSD is the FT of Kxx. 
There are, however, autocovariance functions that are not integrable and 
that nonetheless have a PSD in the sense of Definition 25.7.2. For example, 
TH sinc(T). 
Thus, every continuous autocovariance function that has a PSD in the en- 
gineers’ sense (i.e., that is integrable) also has the same PSD according to 
our definition, but our definition is more general in that some autocovariance 
functions that have a PSD according to our definition are not integrable and 
therefore do not have a PSD in the engineers’ sense. 


(iii) By substituting 7 = 0 in (25.29) and using (25.14) we can express the variance 
of X(t) in terms of the PSD Sxx as 


Var |X (t)] = Kxx(0) = a Sxx(f)df, t¢€R. (25.30) 


(iv) Only processes with continuous autocovariance functions have PSDs, because 
the RHS of (25.29), being the IFT of an integrable function, must be contin- 
uous (Theorem 6.2.11 (ii)). 


(v) It can be shown that if the autocovariance function can be written as the 
IFT of some integrable function, then this latter function must be nonneg- 
ative (except on a set of frequencies of Lebesgue measure zero). This is the 
continuous-time analogue of Proposition 13.6.3. 


The nonnegativity, symmetry, and integrability conditions characterize PSDs in 
the following sense: 


Proposition 25.7.3. Every nonnegative, symmetric, integrable function is the PSD 
of some stationary Gaussian SP whose autocovariance function is continuous. 


Proof. Let S(-) be some integrable, nonnegative, and symmetric function from R 
to the nonnegative reals. Define K(-) to be its IFT 


K(r) = / S(fye®"F" af, TER. (25.31) 
We shall verify that K(-) satisfies the hypotheses of Theorem 25.6.2, namely, that 
it is symmetric and positive definite. It will then follow from Theorem 25.6.2 that 


there exists a stationary Gaussian SP (X(t)) whose autocovariance function Kxx 
is equal to K(-) and is thus given by 


Kxx(r) = ie S(f)e?"47 df, TER. (25.32) 
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This will establish that (X(¢)) is of PSD S(-). The continuity of Kxx will follow 
from the continuity of the IFT of integrable functions (Theorem 6.2.11). 

To conclude the proof we need to show that the function K(-) defined in (25.31) 
is symmetric and positive definite. The symmetry follows from our assumption 
that S(-) is symmetric: 


=f sper aj 
= [sere af 
=K(r), TER, 


where the first equality follows from (25.31); the second from the change of variable 
f & —f; the third by the symmetry of S(-); and the final equality again by (25.31). 


We next prove that K(-) is positive definite. To that end we fix some n € N, some 
constants a1,...,Q@, € R, and some epochs f,,...,t, € R and compute: 


samen —ty ja eee S(f eet f (ty -t, ) df 
=lv’=1 


v=1v/=1 
yes QQ , E2Tf (ty —t, -)) df 


v=1lp/=1 


sin (2 

S(f Os 3 Qy ela fty Oy ak) df 
cok v=l1lv/=1 

k 


“| 
“| 
helen) 
“| 
>0 


Co 


yi=1 
oo 


ela fty 


S(f 


CO 


4 


where the first equality follows from (25.31); the subsequent equalities by simple 
algebra; and the last inequality from our assumption that S(-) is nonnegative. 


25.8 The Spectral Distribution Function 


In this section we shall state without proof Bochner’s Theorem on continuous 
positive definite functions and discuss its application to continuous autocovariance 
functions. We shall then define the spectral distribution function of WSS stochastic 
processes. The concept of a spectral distribution function is more general than 
that of a PSD, because every WSS with a continuous autocovariance function has 
a spectral distribution function, but only some have a PSD. Nevertheless, for our 
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purposes, the notion of PSD will suffice, and the results of this section will not be 
used in the rest of the book. 


Recall that the characteristic function ®x(-) of a RV X is the mapping from R 
to C defined by 
wtrEle'*], weR. (25.33) 


If X is symmetric (i.e., has a symmetric distribution) in the sense that 
Pr[X >a] =Pr[X <—-a], «ER, (25.34) 


then ®x(-) only takes on real values and is a symmetric function, as the following 
argument shows. The symmetry of the distribution of X implies that X and —X 
have the same distribution, which implies that their exponentiations have the same 
law 


iaX Ea eiwx 


e€ @eER, (25.35) 


and a fortiori that the expectation of the two exponentials are equal 
Ele) Sle"). ene, (25.36) 


The LHS of (25.36) is ®x(@), and the RHS is 6x(—a@), thus demonstrating the 
symmetry of ®x(-). To establish that (25.34) also implies that ®x(-) is real, we 
note that, by (25.36), 


5x(w) = Efe'™*] 
(ecg + Ele" ) 


I 


| 
m 


which is real. Here the first equality follows from (25.33); the second from (25.36); 
and the third from the linearity of expectation. 


Bochner’s Theorem establishes a correspondence between continuous, symmetric, 
positive definite functions and characteristic functions. 


Theorem 25.8.1 (Bochner’s Theorem). Let the mapping ®(-) from R to R be 
continuous. Then the following two conditions are equivalent: 


a) ®(-) is the characteristic function of some RV having a symmetric distribu- 
tion. 


b) ®(-) ts a symmetric positive definite function satisfying ®(0) = 1. 


Proof. See (Feller, 1971, Chapter XIX, Section 2) or (Loéve, 1963, Chapter IV, 
Section 14) or (Katznelson, 1976, Chapter VI, Section 2.8). 


Bochner’s Theorem is the key to understanding autocovariance functions: 
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Proposition 25.8.2. Let (X(t)) be a WSS SP whose autocovariance function Kxx 
is continuous. Then: 


(i) There exists a symmetric RV S such that 


Kxx(r) = Kxx(0) E[e?"7°], 7 ER. (25.37) 


(ti) If Kxx(0) > 0, then the distribution of S in (25.37) is uniquely determined 
by Kxx, and (X(t)) has a PSD if, and only if, S has a density. 


Proof. If Kxx(0) = 0, then (X(t)) is deterministic in the sense that for every 
epoch t € R the variance of X(t) is zero. By the inequality |Kxx(T)| < Kxx(0) 
(Lemma 25.4.5, (25.15)) it follows that if Kxx(0) = 0 then Kxx(r) = 0 for all 
Tv ER, and (25.37) holds in this case for any choice of S and there is nothing else 
to prove. 


Consider now the case Kxx (0) > 0. To prove Part (i) we note that because Kxx is 


by assumption continuous, and because all autocovariance functions are symmetric 
and positive definite (see (25.22) and (25.23)), it follows that the mapping 


Kxx (7) 
Kxx(0)’ 
is a continuous, symmetric, positive definite mapping that takes on the value one 


at 7 = 0. Consequently, by Bochner’s Theorem, there exists a RV R of a symmetric 
distribution such that 


R 


Kxx(T) _ ep irk} 
Kxx (0) mee — 


It follows that if we define S as R/(27) then (25.37) will hold, and Part (i) is thus 
also established for the case where Kxx(0) > 0. 


We now conclude the treatment of the case Kxx(0) > 0 by proving Part (ii) for 
this case. That the distribution of S is unique follows because (25.37) implies that 


E[e'"*] = ao @weéeR 


so Kxx determines the characteristic function of S and hence also its distribution 
(Theorem 17.4.4). 


Because the distribution of S' is symmetric, if S has a density then it also has a 
symmetric density. Denote by fgs(-) a symmetric density function for S. In terms 
of fs(-) we can rewrite (25.37) as 


>) 


Kxx(T) => / Kxx (0) f(s) el2mst ds, TE R, 


—co 


so the nonnegative symmetric function Kxx(0) fg(-) is a PSD of (X(t)). Con- 
versely, if (X(t)) has PSD Sxx, then 


Kxx(r) = / ‘ Sxx(f) ei?!" df, 7 ER, (25.38) 
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and (25.37) holds with S having the density 


Ee Sxx (s) 
~ Kxx(0)’ 


(The RHS of (25.39) is symmetric, nonnegative, and integrates to 1 by (25.30).) 


fs(s) ER. (25.39) 


Proposition 25.8.2 motivates us to define the spectral distribution function of a 
continuous autocovariance function (or of a WSS SP having such an autocovariance 
function) as follows. 


Definition 25.8.3 (Spectral Distribution Function). The spectral distribution 
function of a continuous autocovariance function Kxx is the mapping 


€ +> Kxx(0) Pr[S < €], (25.40) 


where S is a random variable for which (25.37) holds. 


25.9 The Average Power 


We next address the average power in the sample-paths of a SP. We would like to 
better understand formal expressions of the form 


for a SP (X(t)) defined on the probability space (Q,F, P). Recalling that if we fix 
w € Q then we can view the trajectory t > X(w,t) as a function of time, we would 
like to think about the integral above as the time-integral of the square of the 
trajectory tt> X(w,t). Since the result of this integral is a (nonnegative) number 
that depends on w, we would like to view this result as a nonnegative RV 


wry = X?(w,t)dt, wen. 
TJ_t/2 


Mathematicians, however, would object to our naive approach on two grounds. The 
first is that it is prima facie unclear whether for every fixed w € 2 the mapping 
tr X?(w,t) is sufficiently well-behaved to allow us to discuss its integral. (It may 
not be Lebesgue measurable.) The second is that, even if this integral could be 
carried out for every w € Q, it is prima facie unclear that the result would be a 
RV. While it would certainly be a mapping from (. to the extended reals (allowing 
for +oo), it is not clear that it would satisfy the technical measurability conditions 
that random variables must meet.° 


5By “X is a random variable possibly taking on the value +00” we mean that X is a mapping 
from 2 to RU {+oo} with the set {w € 2: X(w) < €} being an event for every € € R and with 
the set {w € 2: X(w) = +00} also being an event. 
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To address these objections we shall assume that (Xx (t)) is a “measurable stochastic 
process.” This is a technical condition that will be foreign to most readers and 
that will be inessential to the rest of this book. We mention it here because, in 
order to be mathematically honest, we shall have to slip this attribute into some 
of the theorems that we shall later state. Nothing will be lost on readers who 
replace “measurable stochastic process” with “stochastic process satisfying a mild 
technical condition.” 


Fortunately, this technical condition is, indeed, very mild. For example, Propo- 
sition 25.7.3 still holds if we slip in the attribute “measurable” before the words 
“Gaussian process.” Similarly, in Theorem 25.6.2, if we add the hypothesis that 
the given function is continuous at the origin, then we can slip in the attribute 
“measurable” before the words “stationary Gaussian stochastic process.” ® 


For the benefit of readers who are familiar with Measure Theory, we provide the 
following definition. 


Definition 25.9.1 (Measurable SP). Let (X(t), t € R) be a SP defined over the 
probability space (Q,F,P). We say that the process is a measurable stochastic 
process if the mapping (w,t) + X(w,t) is a measurable mapping from Q x R to R 
when the range R is endowed with the Borel o-algebra on R and when the domain 
Q x R is endowed with the o-algebra defined by the product of F on Q by the Borel 
o-algebra on R. 


The nice thing about measurable stochastic processes is that if (Xx (t)) is a measur- 
able SP, then for every w € 2 the trajectory t + X(w,t) is a Borel (and hence also 
Lebesgue) measurable function of time; see (Halmos, 1950, Chapter 7, Section 34, 
Theorem B) or (Billingsley, 1995, Chapter 3, Section 18, Theorem 18.1 (ii)). More- 
over, for such processes we can sometimes use Fubini’s Theorem to swap the order 
in which we compute time-integrals and expectations; see (Halmos, 1950, Chap- 
ter 7, Section 36) or (Billingsley, 1995, Chapter 3, Section 18, Theorem 18.3 (ii)). 
We can now state the main result of this section regarding the average power in a 
WSS SP. 


Proposition 25.9.2 (Power in a Centered WSS SP). /f (X(t)) is a measurable, 
centered, WSS SP defined over the probability space (Q,F,P) and having the au- 
tocovariance function Kxx, then for every a,b € R satisfying a < b the mapping 


b 
wre — / X?(w,t) dt (25.41) 
al, 


defines a RV (possibly taking on the value +oo) satisfying 


b - A ef X(t) a = Kxx(0). (25.42) 


6These are but very special cases of a much more general result that states that given FDDs 
corresponding to a WSS SP of an autocovariance that is continuous at the origin, there exists 
a SP of the given FDDs that is also measurable. See, for example, (Doob, 1990, Chapter IJ, 
Section § 2, Theorem 2.6). (Replacing the values too with zero may ruin the separability but 
not the measurability.) 
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Proof. The proof of (25.42) is straightforward and merely requires swapping the 
order of integration and expectation. This swap can be justified using Fubini’s The- 
orem. Heuristically, the swapping of expectation and integration can be justified by 
thinking about the integral as being a Riemann integral that can be approximated 
by finite sums and by then recalling the linearity of expectation that guarantees 
that the expectation of a finite sum is the sum of the expectations. We then have 


el f Pal -[ E[X°(t)] dt 
b 


= / Kxx(0) dt 
= (b— a) Kxx(0), 


where the first equality follows by swapping the integration with the expectation; 
the second because our assumption that (Xx (t)) is centered implies that for every 
t € R the RV X(t) is centered and by (25.13); and the final equality because the 
integrand is constant. 


That (25.41) is a RV (possibly taking on the value +00) follows from Fubini’s 
Theorem. 


Recalling Definition 14.6.1 of the power in a SP as 


we conclude: 


Corollary 25.9.3. The power in a centered, measurable, WSS SP (X(t) of auto- 
covariance function Kxx is equal to Kxx (0). 


25.10 Linear Functionals 


For the problem of detecting continuous-time signals corrupted by noise, we shall 
be interested in stochastic integrals of the form 


/ X (t) s(t) dt (25.43) 
for WSS stochastic processes (X(t)) defined over a probability space (Q,F, P) 
and for properly well-behaved deterministic functions s(-). We would like to think 
about the result of such an integral as defining a RV 


are i ” X(u,t) a(t) at (25.44) 


that maps each w € 2 to the real number that is the result of the integration 
over time of the product of the trajectory t + X(w,t) corresponding to w by the 
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deterministic function t +> s(t). That is, each w is mapped to the inner product 
between its trajectory t> X(w,t) and the function s(-). 


This is an excellent way of thinking about such integrals, but we do run into 
some mathematical objections similar to those we encountered in Section 25.9. 
For example, it is not obvious that for each w € 2 the mapping t + X(w,t) s(t) 
is a sufficiently well-behaved function for the time-integral to be defined. As we 
shall see, for this reason we must impose certain restrictions on s(-), and we will 
not claim that t + X(w,t) s(t) is integrable for every w € Q but only for w’s in 
some subset of 2 having probability one. Also, even if this issue is addressed, it is 
unclear that the mapping of w to the result of the integration is a RV. While it is 
clearly a mapping from (2 to the reals, it is unclear that it satisfies the additional 
mathematical requirement of measurability, i.e., that for every € € R the set 


{een: [” xw.nswarse} 


be an event, i.e., an element of F. 


We ask the reader to take it on faith that these issues can be resolved and to focus 
on the relatively straightforward computation of the mean and variance of (25.44). 
The resolution of the measurability issues is provided in Proposition 25.10.1, whose 
proof is recommended only to readers with background in Measure Theory. 


We shall assume throughout that (X(t)) is WSS and that the deterministic function 
s: R— R is integrable. We begin by heuristically deriving the mean: 


Bt X(t s(t)ar| = i E[. X(t) s(t)] dt 
=f ax s(t) dt 


==) Sof" s(t) dt, (25.45) 


with the following heuristic justification. The first equality follows by swapping 
the expectation with the time-integration; the second because s(-) is deterministic; 
and the last equality from our assumption that (X (t)) is WSS, which implies that 
(X(t)) is of constant mean: E[X(t)] = ELX(0)] for allt € R. 


We next heuristically derive the variance of the integral in terms of the autocovari- 
ance function Kxx of the process (Xx (t)). We begin by considering the case where 
(X(t)) is of zero mean. In this case we have 


wf ssn) -[([- 0003) 
alee os) ar) 
: e| fa X(t )s() X() s(7) ara | 
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= i ‘s s(t) s(r) ELX(t) X(r7)] dt dr 


Bie [+ t) Kxx(t — 7) 8(r) dt dr, (25.46) 


where the first equality follows because (25.45) and our assumption that (X(t) 
is centered combine to guarantee that { X(t) s(t) dt is of zero mean; the second 
by writing a? as a times a; the third by writing the product of integrals over R 
as a double integral (ie., as an integral over R?); the fourth by swapping the 
double-integral with the expectation; and the final equality by the definition of the 
autocovariance function (Definition 25.4.4) and because (X(t)) is centered. 


There are two equivalent ways of writing the RHS of (25.46) that we wish to point 
out. The first is obtained from (25.46) by changing the integration variables from 
(t,7) to (0,7), where o & t —7 and by performing the integration first over 7 and 
then over o: 


var| fx a¢ iat] = i iz eae Ave: 
7 a i‘. s(a + 7) Kxx(o) s(7) do dr 


-[ a Kxx(o) f . s(o +1) s(r) drdo 


~ i Kxx (c) Rss (a) do, (25.47) 


—Co 


where Rgg is the self-similarity function of s (Definition 11.2.1 and Section 11.4). 


The second equivalent way of writing (25.46) can be derived from (25.47) when 
(X(t)) is of PSD Sxx. Since (25.47) has the form of an inner product, we can use 
Proposition 6.2.4 to write this inner product in the frequency domain by noting 
that the FT of Res is f + |8(f)|? (see (11.35)) and that Kxx is the IFT of its 
PSD Sxy. The result is that 


var| X(t) s(t) a = i Sxx (f)|8(f) 


We next show that (25.46) (and hence also (25.47) & (25.48), which are equivalent 
ways of writing (25.46)) remains valid also when (X(t)) is of mean 1 (not neces- 


| af. (25.48) 


sarily zero). To see this we can consider the zero-mean SP (X (t)) defined at every 
epoch t € R by X(t) = X(t) — p and formally compute 


Var if X(t) s(t) a = var| fo (X(t) +») s(t) dt 


aval {xo siyaren f anal 
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=. [« ¢(t— 7) 8(r) dtdr 


a [« t) Kxx(t — 7) s(r) dt dr, (25.49) 


where the first equality follows from the definition of X(t) as X(t) — yj; the second 
by the linearity of integration; the third because adding a deterministic quantity 
to a RV does not change its covariance; the fourth by (25.46) applied to the zero- 
mean process (X (t)); and the final equality because the autocovariance function 
of (X(t)) is the same as the autocovariance function of (X(t)) (Definition 25.4.4). 


As above, once a result is proved for centered stochastic processes, its extension 
to WSS stochastic processes with a mean can be straightforward. Consequently, 
we shall often derive our results for centered WSS stochastic processes and leave 
it to the reader to extend them to mean-y stochastic processes by expressing such 
stochastic processes as the sum of a zero-mean SP and the deterministic constant j.. 


As promised, we now state the results about the mean and variance of (25.44) in 
a mathematically defensible proposition. 


Proposition 25.10.1 (Mean and Variance of Linear Functionals of a WSS SP). 
Let (X(t)) be a measurable WSS SP defined over the probability space (QF, P) 
and having the autocovariance function Kxx. Lets: R — R be some deterministic 
integrable function. Then: 


(i) For every w €Q the mapping t > X(w,t) s(t) is Lebesgue measurable. 
(ti) The set 
N& {wens f [xwoa t)| dt = co} (25.50) 
is an event and is of probability zero. 
(iti) The mapping from Q\ N to R defined by 


nies a X(u,t) 9(t) dé (25.51) 


is measurable with respect to F. 


(iv) The mapping from Q to R defined by 


te X(w,t)s(t)dt ifwéN, 
Wk 


otherwise, 


(25.52) 


defines a random variable. 


(v) The mean of this RV is 
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(vi) Its variance is 


im feat t) Kxx(t — 7) s(r) dr dt, (25.53) 
which can also be expressed as 
/ Kxx(o) Rss(o) do, (25.54) 


where Reg is the self-similarity function of s. 


(vii) If (X(t)) is of PSD Sxx, then the variance of this RV can be expressed as 
i, Sxx(f (f)|? df. (25.55) 


Proof. Part (i) follows because the measurability of the process (X(t)) guarantees 
that for every w € Q the mapping t +> X(w,t) is Borel measurable and hence a 
fortiori Lebesgue measurable; see (Billingsley, 1995, Chapter 3, Section 18, Theo- 
rem 18.1 (ii)). 

If s happens to be Borel measurable, then Parts (ii)—(v) follow directly by Fubini’s 
Theorem (Billingsley, 1995, Chapter 3, Section 18, Theorem 18.3) because in this 
case the mapping (w,t) + X(w,t) s(t) is measurable (with respect to the product 
of F by the Borel o-algebra on the real line) and because 


[ elx@ sw] a= 3 ” E[LX(@)|] [s(é) at 
< VE[X2(0 aif Is (4)| at 


< oO, 


where the first inequality follows from (25.16), and where the second inequality 
follows from our assumption that s is integrable. 


To prove Parts (i)-(v) for the case where s is Lebesgue measurable but not Borel 
measurable, recall that every Lebesgue measurable function is equal (except on 
a set of Lebesgue measure zero) to a Borel measurable function (Rudin, 1974, 
Chapter 7, Lemma 1), and note that the RHS of (25.50) and the mappings in 
(25.51) and (25.52) are unaltered when s is replaced with a function that is identical 
to it outside a set of Lebesgue measure zero. 


We next prove Part (vi) under the assumption that (X(t)) is centered. The more 
general case then follows from the argument leading to (25.49). To prove Part (vi) 
we need to justify the steps leading to (25.46). For the reader’s convenience we 
repeat these steps here and then proceed to justify them. 


Var| X(t ystpat] =e]( J x ia nat) | 
=E ( im _ Xs) ar) ( / . ae ar) 
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__ if i. X(t) s(t) X(r) s(7) dt dr 
ee i s(t) s(r) E[X(#)X(r)] dtdr 
af f« DKactea A alae 


The first equality holds because for centered processes, by Part (v), the RV on the 
LHS is of zero mean; the second follows by writing a? as a times a; the third follows 
because for w’s satisfying | |X(w, t) s(t)| dt < oo we can use Fubini’s Theorem to 
replace the iterated integrals with a double integral and because other w’s occur 
with zero probability and therefore do not influence the expectation; the fourth 
equality entails swapping the expectation with the integration over R? and can be 
justified by Fubini’s Theorem because, by (25.17), 


ff bos E[|X(t)X(z)|] dtdr < Kxx(0 of [i zy aber 


= Kxx (0) |Is|I7 
< OO; 


and the final equality follows from the definition of the autocovariance function 
(Definition 25.4.4). 


Having derived (25.53) we can derive (25.54) by following the steps leading to 
(25.47). The only issue that needs clarification is the justification for replacing 
the integral over R? with the iterated integrals. This is justified using Fubini’s 
Theorem by noting that, by (25.15), | Kxx(a)| < Kxx(0) and that s is integrable: 


fw mi ft s(o + T) Kxx( (a)| do dr < Kxx(0) (0 ) fis ol fist s(o + 7)|dodr 


= Kxx(0) |Is|lj 


< OO. 


Finally, Part (vii) follows from (25.54) and from Proposition 6.2.4 by noting that, 
by (11.34) & (11.35), Res is integrable and of FT 


Re(f) =a), FER, 


and that, by Definition 25.7.2, if Sxx is the PSD of (X(t)), then Sxx is integrable 
and its IFT is Kxx, iLe., 


Kxx(o ie Sxx(f eet fo df. 
Note 25.10.2. 


(i) In the future we shall sometimes write 


ae X(t) s(t) dt 
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instead of the mathematically more explicit (25.52) & (25.50). Sometimes, 
however, we shall make the argument w € 2 more explicit: 


oo : * xX (w,2) s(t)dt if [°° |X(w,t) s(t)| dt < co, 
CH X(t) s(t) ar) (w) = is 


otherwise. 


(ii) If s; and s2 are indistinguishable integrable real signals (Definition 2.5.2), 
then the random variables f°. X(t)s1(t) dt and f° X(t)s2(t) dt are identi- 
cal. 


(iii) For every ac R 


is X(t) (a s(t)) dt = af” X(t) s(t) dt. (25.56) 


(iv) We caution the very careful readers that if s; and sz are integrable func- 
tions, then there may be some w’s in 2 for which the stochastic integral 
(J, X(t) (si(t) + s2(t)) dt)(w) is not equal to the sum of the stochastic 
integrals (f°°. X(t) s1(t) dt)(w) and (f°, X(t) s(t) dt)(w). This can hap- 
pen, for example, if the trajectory t + X(w,t) corresponding to w is such 
that either f |X(w,t) s1(t)|dt or f |X(w,t) s2(t)| dé is infinite, but not both. 
Fortunately, as we shall see in Lemma 25.10.3, such w’s occur with zero prob- 
ability. 


as, 
& 


The value that we have chosen to assign to the integral in (25.52) when w is 
in NV is immaterial. Such w’s occur with zero probability, so this value does 
not influence the distribution of the integral.” 


Lemma 25.10.3 (“Almost” Linearity of Stochastic Integration). Let (X(t)) be 
a measurable WSS SP, let si,...,Sm: R — R be integrable, and let y1,...,Y%m be 
real. Then the random variables 


ae (3 X(t) (So (0) a) (w) (25.57) 
ine 


wh Y 5 (a X(t) 8;(t) ar) o) (25.58) 


differ on at most a set of w’s of probability zero. In particular, the two random 
variables have the same distribution. 


Note 25.10.4. In view of this lemma we shall write, somewhat imprecisely, 


Co 


ie X(t)(a1 51(t) + a2 89(t)) dt = AL ie X(t) 81(t) dt+ a | X(t) So(t) dt. 


—co = 00 


“The value zero is convenient because it guarantees that (25.56) holds even for w’s for which 
the mapping t > X(w,t) s(£) is not integrable. 
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Proof of Lemma 25.10.3. Let (02,4, P) be the probability space over which the 
SP (X(t)) is defined. Define the function 


m 


so: tt > 7; 5;(t) (25.59) 
j=l 
and the sets 
Ny ={wen: | X(w,1)5,(0|ae= oo} pe Ome 


By (25.59) and the Triangle Inequality (2.12) 
|X, t) s0(#)| < Do wl |X,1) (|, we Q, tER, 
j=l 


which implies that 


j=l 
By the Union Bound (or more specifically by Corollary 21.5.2 (i)), the set on the 
RHS is of probability zero. The proof is concluded by noting that, outside this 
set, the random variables (25.57) and (25.58) are identical. This follows because, 
for w’s outside this set, all the integrals are finite so linearity holds. 


25.11 Linear Functionals of Gaussian Processes 


We continue our discussion of integrals of the form [ X(t) s(t) dt, but this time with 
the additional assumption that (X(t)) is Gaussian. The main result of this section 
is Proposition 25.11.1, which states that, subject to some technical conditions, the 
result of this integral is a Gaussian RV. In fact, Proposition 25.11.1 is a bit more 
general and addresses expressions of the form 


[. X(t) s(t) dt + S a,X(t,), (25.60) 


where (X(t)) is a stationary Gaussian process, s: R — R is integrable, n is an 
arbitrary nonnegative integer, and the coefficients aj,...,@, € R and the epochs 
t1,.--,tn € R are arbitrary. It shows that, subject to the additional technical 
condition that (X(t)) is measurable, the result of (25.60) is a Gaussian RV. Con- 
sequently, its distribution is fully specified by its mean and variance, which, as we 
shall see, can be easily computed from the autocovariance function Kxx. 


The proof of the Gaussianity of (25.60) (Proposition 25.11.1 ahead) is technical, 
so we encourage the reader to focus on the following heuristic argument. Suppose 
that the integral is a Riemann integral and that we can therefore approximate it 


with a finite sum x 


ie X(t) s(t)dt~ S° 6X (dk) s(6k) 


k=—K 


538 Continuous- Time Stochastic Processes 


for some large enough K and small enough 6 > 0. (Do not bother trying to sort 
out the exact sense in which this approximation holds. This is, after all, a heuristic 
argument.) Consequently, we can approximate (25.60) by 


fore) n K n 
‘| X(t) s(t) dt+ S~ aX (tr) & S> 68(5k) X(5k) + Say X(t). (25.61) 


k=-K 
But the RHS of the above is just a linear combination of the random variables 
X(—K6), a , X(K6), X(t1), on8 ,X (ty), 


which are jointly Gaussian because (X(t)) is a Gaussian SP. Since a linear func- 
tional of jointly Gaussian random variables is Gaussian (Theorem 23.6.17), the 
RHS of (25.61) is Gaussian, thus making it plausible that its LHS is also Gaussian. 
Before stating the main result of this section in a mathematically defensible way, 
we now proceed to compute the mean and variance of (25.60). We assume that s(-) 
is integrable and that (X(t)) is measurable and WSS. (Gaussianity is inessential 
for the computation of the mean and variance.) The computation is very similar 
to the one leading to (25.45) and (25.46). For the mean we have: 


ef xwaars a) =e fx scar] + ee) 


= E[X(0)] (a s(t) dt + Yio); (25.62) 


where the first equality follows from the linearity of expectation and where the 
second equality follows from (25.45) and from the wide-sense stationarity of (X(t), 
which implies that E[X(t)] = E[X(0)], for all ¢ € R. 

For the purpose of computing the variance of (25.60), we assume that (X (t)) is 
centered. The result continues to hold if (Xx (t)) has a nonzero mean, because the 
mean of (X(t)) does not influence the variance of (25.60). We begin by expanding 
the variance as 


Var if X(t) s(t) dt + 2 a, X(t)| = Var if X(t) s(t) a 


+ Var | s a X(t)| 42 3 i Giy | i :, X(t) s(t) dt, x(t) (25.63) 


and by noting that, by (25.47), 


Var| i, ‘ X(t) s(t) a ~ . 7 Kxx(o) Res(o) do (25.64) 
and that, by (25.24), 
var| So a,X(t)| =S0S5 aay Kxx (ty — t). (25.65) 
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To complete the computation of the variance of (25.60) it remains to compute the 
covariance in the last term in (25.63): 


lac) [- xst aU X(t)X (ty) s(t) de 


_ ih ty Rea aN ae (25.66) 


Combining (25.63) with (25.64)-(25.66) we obtain 


var| ff X (t) s(t) dt + Y(t) = i Kxx(c) Res(o) do 


—oco 


SS: Owaae 429000 fa t) Kxx(t—t,) dt. (25.67) 


v=lv/=1 
We now state the main result about linear functionals of Gaussian stochastic pro- 
cesses. The proof is recommended for mathematically-inclined readers only. 


Proposition 25.11.1 (Linear Functional of Stationary Gaussian Processes). Con- 
sider the setup of Proposition 25.10.1 with the additional assumption that the pro- 
cess (X(t) is Gaussian. Additionally introduce the coefficients ay,...,Qn € R and 
the epochs t1,...,tn € R for some n € N. Then there exists an event N € F of 
zero probability such that for allw ¢ N the mapping t > X(w,t) s(t) is a Lebesgue 
integrable function: 


(the mapping t+> X(w,t) s(t) is in £1), wt N, (25.68a) 
and the mapping from Q to R 
/ X(w,t) s(t)dt+ Sa, X(w,ty) ifwEéN, 
cel ma wee. v=1 


otherwise 


(25.68b) 


is a Gaussian RV whose mean and variance are given in (25.62) and (25.67). 


Proof. We prove this result when (X(t)) is centered. The extension to the more 
general case follows by noting that adding a deterministic constant to a zero-mean 
Gaussian results in a Gaussian. We also assume that s(-) is Borel measurable, 
because once the theorem is established for this case it immediately also extends 
to the case where s(-) is only Lebesgue measurable by noting that every Lebesgue 
measurable function is equal almost everywhere to a Borel measurable function. 


The existence of the event VV and the fact that the mapping (25.68b) is a RV follow 
from Proposition 25.10.1. We next show that the RV 


ipa / _X(w.t)(at+ y a tit) eM, ees 


otherwise, 
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is Gaussian. 
To that end, define for every k € N the function 


teER. 25.70 
0 otherwise. ( ) 


ee i if |t| < k and |s(t)| < Vk, 


Note that for every w € 2 


lim X(w,t) 8,(t) = X(w,t) s(t), teR, 


k-o0o 


and 
|X(w, t) se(t)| < |X(w,t) s()|,  tER, 


so, by the Dominated Convergence Theorem and (25.68a), 


lim _ X(t t) 5,(t) dt = es X(w,t)s(t)dt, wéN. (25.71) 


k—0o 


Define now for every k € N the RV 


X(w,t )dt+ y ty) if N, 
Ye (w ee Ce) Yau ee) Ne (25.72) 
0 


otherwise. 


It follows from (25.71) that the sequence Yj, Y2,... converges almost surely to Y. 
To prove that Y is Gaussian, it thus suffices to prove that for every k € N the RV 
Y; is Gaussian (Theorem 19.9.1). 


To prove that Y; is Gaussian, we begin by showing that it is of finite variance. To 
that end, it suffices to show that the RV 


Few) # SR XW, t)s(t)dt ifw EN, (25.73) 
7 0 otherwise 


is of finite variance. We prove this by using the definition of s,(-) (25.70) and by 
using the Cauchy-Schwarz Inequality to show that for every w ¢ N 


= (fo xo s(t), 
is ([ xe. s(t), 

< f x%u,nat [twa 
< ([ Pe.nar)oe, 


where the equality in the first line follows from the definition of Y, (25.73); the 
equality in the second line from the definition of s;,(-) (25.70); the inequality in the 
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third line from the Cauchy-Schwarz Inequality; and the final inequality again by 
(25.70). Since V is an event of probability zero, it follows from this inequality that 


E [¥?| < 4k? Kxx (0) < 00, 


thus establishing that Y;,, and hence also Y;, is of finite variance. 


To prove that Y; is Gaussian we shall use some results about the Hilbert space 
L?(Q,F, P) of (the equivalence classes of) the random variables that are defined 
over (Q, F, P) and that have a finite second moment; see, for example, (Shiryaev, 
1996, Chapter II, Section 11). Let G denote the closed linear subspace of L?(Q, F, P) 
that is generated by the random variables (Xx (t), t€ R). Thus, G contains all fi- 
nite linear combinations of the random variables (X(t), t € R) as well as the 
mean-square limits of such linear combinations. Since the process (X(t), t € R) 
is Gaussian, it follows that all such linear combinations are Gaussian. And since 
mean-square limits of Gaussian random variables are Gaussian (Theorem 19.9.1), 
it follows that G contains only random variables that have a Gaussian distribu- 
tion (Shiryaev, 1996, Chapter II, Section 13, Paragraph 6). To prove that Y; is 
Gaussian it thus suffices to show that it is an element of G. 


To prove that Y; is an element of G, decompose Y; as 
¥, =Yf+y¥,, (25.74) 


where Ye is the projection of Y; onto G and where bre is consequently perpendic- 
ular to every element of G and a fortiori to all the random variables (X(t), t € R): 


E[X()Y;] =0, teR. (25.75) 
Since Y; is of finite variance, this decomposition is possible and 


E|(¥f)"],€[(¥it)"] <co. (25.76) 


To prove that Y; is an element of G we shall next show that E|(¥e)’| =0or, 
equivalently (in view of (25.74)), that 


Ely. | 0: (25.77) 


To establish (25.77) we evaluate its LHS as follows: 
E[Y.¥e] =E @ X(t) z(t) dt + > a, X(t) )¥e| 
Tee v=1 


| 
=e / 7 X(t) sx(t) ar) ve ue s a, E[X sone 


=E ( 7 X(t) s;,(t) ar) ye 
ae i E[X(t) sx (t)Yg-| de 
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su / ” E[X()YE] on(t) at 
ee 


—c\ 


0 
= 0, 


where the first equality follows from the definition of Y; (25.72); the second from 
the linearity of expectation; the third from the orthogonality (25.75); the fourth by 
an application of Fubini’s Theorem that we shall justify shortly; the fifth because 
sx(-) is a deterministic function; and the final equality again by (25.75). This 
establishes (25.77) subject to a verification that the conditions of Fubini’s Theorem 
are satisfied, a verification we conduct now. That (w,t) > X(w,t)Y(w) s,(t) is 
measurable follows because (X (t), t€ R) is a measurable SP; Y;1, being a RV, 
is measurable with respect to F; and because the Borel measurability of s(-) also 
implies the Borel measurability of s;,(-). The integrability of this function follows 
from the Cauchy-Schwarz Inequality for random variables 


i E||x()¥e |sz(t)| dt < i JE[X2(t)] E|(%) | |s,(t)| dt 
< VKxx(Oy/E| (ve)"] 2k vi 
< OO, 


where the second inequality follows from the definition of s;(-) (25.70), and where 
the third inequality follows from (25.76). This justifies the use of Fubini’s Theorem 
in the proof of (25.77). We have thus demonstrated that Y; is in G, and hence, like 
all elements of G, is Gaussian. This concludes the proof of the Gaussianity of Y;, 
for every k € N and hence the Gaussianity of Y. 


It only remains to verify that the mean and variance of Y are as stated in the 
theorem. The only part of the derivation of (25.67) that we have not yet justified 
is the derivation of (25.66) and, in particular, the swapping of the expectation and 
integration. But this is easily justified using Fubini’s Theorem because, by (25.17), 


/ * El] X(t) XI] Is(lat < (Kxx(0) + ELXO)?) [lslly < oo. (25.78) 


—Co 


Proposition 25.11.1 is extremely powerful because it allows us to determine the 
distribution of a linear functional of a Gaussian SP from its mean and variance. 
In the next section we shall extend this result and show that any finite number of 
linear functionals of a Gaussian SP are jointly Gaussian. Their joint distribution 
is thus fully determined by the mean vector and the covariance matrix, which, as 
we shall see, can be readily computed from the autocovariance function. 


25.12 The Joint Distribution of Linear Functionals 


Let us now shift our focus from the distribution of a single linear functional to the 
joint distribution of a collection of such functionals. Specifically, we consider m 
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functionals 
/ X(t) s;(t)dt+ S > ajyX (tiv), f=l,...,m (25.79) 
Fe v=1 
of the measurable, stationary Gaussian SP (X(t)). Here the m real-valued sig- 
nals $1,...,Sm are integrable, n1,...,%m are in N, and aj, t;,, are deterministic 
constants for all v € {1,...,n,;}. 


The main result of this section is that if (X(t)) is a Gaussian SP, then the random 
variables in (25.79) are jointly Gaussian. 


Theorem 25.12.1 (Linear Functionals of a Gaussian SP Are Jointly Gaussian). 
The m linear functionals 


X(t)s;(t)dt+ > aj X(t), J=l,...,.m 
arc v=1 


of a measurable, stationary, Gaussian SP (X(t) are jointly Gaussian, whenever 
m €N; the m functions {s;}7, are integrable functions from R to R; the inte- 
gers {nj} are nonnegative; and the coefficients {a;,} and the epochs {tj} are 
deterministic real numbers for all j € {1,...,m} and all v € {1,...,nj}. 


Proof. It suffices to show that any linear combination of these linear function- 
als has a univariate Gaussian distribution (Theorem 23.6.17). This follows from 
Proposition 25.11.1 and Lemma 25.10.3 because, by Lemma 25.10.3, for any choice 
of the coefficients 71,...,Y%m € R the linear combination 


1 (f- X(t) s1(¢) a+ Yow (hu) ae 
+ Ym ([- X(t) 8m(t) dt + Yn (Cmw)) 


has the same distribution as the linear functional 


ie x(t) © 14 8) «)) dt + y % Viiv X (thw); 


which, by Proposition 25.11.1, has a univariate Gaussian distribution. 


It follows from Theorem 25.12.1 that if (X(t)) is a measurable, stationary, Gaussian 
SP, then the joint distribution of the random variables in (25.79) is fully specified 
by their means and their covariance matrix. If (X(t)) is centered, then by (25.62) 
these random variables are centered, so their joint distribution is determined by 
their covariance matrix. We next show how this covariance matrix can be computed 
from the autocovariance function Kxx. To this end we assume that (X(t)) is 
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centered, and expand the covariance between any two such functionals as follows: 
cou] J x X(t t) s(t Jat + SX tip i X(t ) p(t ae ae 
y=1 
= Cov if X(t) 8;(t) at, X(t) 8, (t) a 

+ 5° a;,,Cov [xie.), f X(t) 5;,(t) a 

v=1 =09 

Nk love) 
+ > Qk,v! Cov [xttiw | X(t) 8;(t) a 

v'=1 i 


Nj Nk 


+320 ajvaK Cov[X (th), X(tev)], 9,4 € {1,...,m}. (25.80) 


v=l1v’/=1 


The second and third terms on the RHS can be computed from the autocovariance 
function Kxx using (25.66). The fourth term can be computed from Kxx by noting 
that Cov[X (t;,), X (tk,v’)] = Kxx (tj,, —tk,v’) (Definition 25.4.4). We now evaluate 
the first term: 


cov] | xisinat, [xO sultat 
Hip if X(t) s,(t) arf KN Sa) ar| 
-E if [xo 5, (t) X(r) 8x(7) aar| 
= [ex@ xe) Bip veRCA akan 


=i ie Kxx(t—7) s;(t) se(r)dtdr, (25.81) 


which is the generalization of (25.53). By changing variables from (t,7) to (t,o), 
where ¢ = t—T, we can obtain the generalization of (25.54). Starting from (25.81) 


cov] | X(t) ;(t car, f X(t) s,(t (ail = in is Kxx(t — 7) 8;(t) 8¢(r) dtdr 


ae Keto ) fs j(t) 54(t — 0) dt do 
=f kxx(o) [sl elo -Hatao 


ape Kxx(o) (8; *8%)(a) do. (25.82) 


If (X(t)) is of PSD Sxx, then we can rewrite (25.82) in the frequency domain 
using Proposition 6.2.4 in much the same way that we rewrote (25.46) in the form 
(25.48): 


cov] f xysinar f x@snae] =f SexiNssNar (25.88) 
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where we have used the fact that the FT of s; *« $8; is the product of the FT of s; 
and the FT of 8%, and that the FT of §; is f +> &,(—f), which, because sy, is real, 
is also given by f +> 87 (f). 


The key second-order properties of linear functionals of measurable WSS stochastic 
processes are summarized in the following theorem. Using these properties and 
(25.80) we can compute the covariance matrix of the linear functionals in (25.79), 
a matrix which fully specifies their joint distribution whenever (X(t)) is a centered 
Gaussian SP. 


Theorem 25.12.2 (Covariance Properties of Linear Functionals of a WSS SP). 
Let (X(t)) be a measurable WSS SP. 


(i) If the real signal s is integrable, then 


var| X (t) s(t) a = i Kxx(a) Rss(a) do, (25.84) 


—oo 
where Reg is the self-similarity function of s. Furthermore, for every fixed 
epoch T ER 


Ca [xo au, X(n) = [- s(t) Kxx(r—t)dt, TER. (25.85) 


—oco 


If'si,82 are real-valued integrable signals, then 


Cov [fx 81(t) at, X(t) 89(t) a és ‘s Kxx(o) (s1 * $2) (a) do. 
7 _ (25.86) 
(i) If (X(t) is of PSD Sxx, then for s, $1, S2, and T as above 


var| X(t) a(t) a a i Sex(f) [SCA af, (25.87) 


Cov if X(t) s(t) au,X(n) = i Sxx(f) 8(f) e277 df, (25.88) 


and 
Cov | [ x@swae, [xe sat a =f Se(Nansanar. 
(25.89) 


Proof. Most of these claims have already been proved. Indeed, (25.84) was proved 
in Proposition 25.10.1 (vi), and (25.85) was proved in Proposition 25.11.1 using 
Fubini’s Theorem and (25.78). However, (25.86) was only derived heuristically 
in (25.81) and (25.82). To rigorously justify this derivation one can use Fubini’s 
Theorem, or use the relation 
1 
Cov[X,Y] = 5 (Varlx + Y] — Var[X] — Var[Y]) 

and the result for the variance, namely, (25.84). 

All the results in Part (ii) of this theorem follow from the corresponding results in 
Part (i) using the definition of the PSD and Proposition 6.2.4. 
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Figure 25.2: Passing a SP (X(t)) through a stable filter of impulse response h(-). 
If (X(t)) is measurable and WSS, then so is the output (X(t))*h. If, additionally, 
(X(t)) is of PSD Sxx, then the output is of PSD f + Sxx(f) |A(f)|?. If (X(#) is 
additionally Gaussian, then so is the output. 


25.13 Filtering WSS Processes 


We next discuss the result of passing a WSS SP through a stable filter, i.e., the 
convolution of a SP with a deterministic integrable function. Our main result is 
that, subject to some technical conditions, the following hold: 


(i) Passing a WSS SP through a stable filter produces a WSS SP. 


(ii) If the input to the filter is of PSD Syx, then the output of the filter is of 
PSD f & Sxx(f) |R(f)|?, where h(-) is the filter’s frequency response. 


(iii) If the input to the filter is a Gaussian SP, then so is the output. 


We state this result in Theorem 25.13.2. But first we must define the convolution 
of a SP with an integrable deterministic signal. Our approach is to build on our 
definition of linear functionals of WSS stochastic processes (Section 25.10) and to 
define the convolution of (X(t)) with h(-) as the SP that maps every epoch t € R 


to the RV 
a X(c) h(t — a) do, 


where the above integral is the linear functional 
i: X(a)s(a)do_ with s:ah(t—o). 


With this approach the key results will follow by applying Theorem 25.12.2 with 
the proper substitutions. 


Definition 25.13.1 (Filtering a Stochastic Process). The convolution of a mea- 
surable, WSS SP (X(t) with an integrable function h: R — R is denoted by 


(X(t)) «h 
and is defined as the SP that maps every t € R to the RV 


is X(oyAt—0) de, (25.90) 
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where the stochastic integral in (25.90) is the stochastic integral that was defined 
in Note 25.10.2 


(X(t)) xh: (w,t) 6 - X(w,o)h(t-—o)do if f° |X(w,o) h(t — o)| do < 00, 


0 otherwise. 


Theorem 25.13.2. Let (Y(t)) be the result of convolving the measurable, cen- 
tered, WSS SP (X(t) of autocovariance function Kxx with the integrable function 
h:R-R. 


(i) The SP (Y(t)) is centered, measurable, and WSS with autocovariance func- 
tion 


Kyy = Kxx *Rnh, (25.91) 
where Ryn is the self-similarity function of h (Section 11.4). 


(ii) If (X(t)) is of PSD Sxx, then (Y(t)) is of PSD 


Sy (f) = |ACA)/?Sxx(f), FER. (25.92) 


(itt) For every t,7 ER, 
E[.X(t) Y(t+7)] = (Kxx *h)(7), (25.93) 
where the RHS does not depend on t.® 


(iv) If (X(t) is Gaussian, then so is (Y(t). Moreover, for every choice of 
n,m €N and for every choice of the epochs ty,...,tn,tn41,---;tntm € R, 
the random variables 


Mirena X ts), Weed ky (a) (25.94) 


are jointly Gaussian.® 


Proof. For fixed t,7 € R we use Definition 25.13.1 to express Y(t) and Y(t+7) as 


Y(t) = i, X (a) 81(0) do, (25.95) 
and a 
Y(t+7)= : X (a) 82(0) do, (25.96) 
where 
sj:ar h(t—o), (25.97) 
Sgiath(t+r—o). (25.98) 


’Two stochastic processes (X(t)) and (Y(t)) are said to be jointly wide-sense stationary 
if each is WSS and if E[X(t)Y(t+7)] does not depend on t. 
°That is, (X(¢)) and (Y(¢)) are jointly Gaussian processes. 
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We are now ready to prove Part (i). That (Y(¢)) is centered follows from the 
representation of Y(t) in (25.95) & (25.97) as a linear functional of (X(t)) and 
from the hypothesis that (X(t)) is centered (Proposition 25.10.1). 


To establish that (Y(t)) is WSS we use the representations (25.95)-(25.98) and 
Theorem 25.12.2 regarding the covariance between two linear functionals as follows. 


Cov[Y(t +7), Y(t)] = Cov if X(o) 89(a) do, [. X(o) 81(o) do 


= i Kxx(o) (s2 *§1)(o) do, (25.99) 


—oCo 


where the convolution can be evaluated as 


(60 8i)(0) = [a2 80 ~ nau 


= [Matron an 


—co 


= Rnn(7 = 0), (25.100) 
where fi = t +o — py. Combining (25.99) with (25.100) yields 
Cov[Y(t +7), Y(t)] = (Kxx *Rnn)(7), #7 ER, (25.101) 


where the RHS does not depend on ¢. This establishes that (Y(t)) is WSS and 
proves (25.91).1° 


To conclude the proof of Part (i) we now show that (Y(t)) is measurable. The proof 
is technical and requires background in Measure Theory. Readers are encouraged 
to skip it and move on to the proof of Part (ii). 


We first note that, as in the proof of Proposition 25.10.1, it suffices to prove the 
result for impulse response functions h that are Borel measurable; the extension 
to Lebesgue measurable functions will then follow by approximating h by a Borel 
measurable function that differs from it on a set of Lebesgue measure zero (Rudin, 
1974, Chapter 7, Lemma 1) and by then applying Part (ii) of Note 25.10.2. We 
hence now assume that h is Borel measurable. 


We shall prove that (Y(t) is measurable by proving that the (nonstationary) 
process (w,t) + Y(w,t)/(1 +1?) is measurable. This we shall prove using Fubini’s 
Theorem applied to the function from (Q x R) x R to R defined by 


X(w,o) h(t — oc) 
1+? 


((w,t),0): ((w,2) EQxR ce R), (25.102) 


This function is measurable because, by assumption, (X (t)) is measurable and 
because the measurability of the function A(-) implies the measurability of the 


l0That (Y(t) is of finite variance follows from (25.101) by setting 7 = 0 and noting that 
the convolution on the RHS of (25.101) is between a bounded function (Kx x) and an integrable 
function (Rpp) and is thus defined and finite at every 7 € R and a fortiori at 7 = 0. 
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function (t,0) + h(t — a) (as proved, for example, in (Rudin, 1974, p. 157)). We 
next verify that this function is integrable. To that end, we first integrate its 
absolute value over (w,t) and then over o. The integral over (w,t) is given by 


i E(IX(o)l] lh = o)| ars VR) Into) oy 


Saks 1+? 


where the inequality follows from (25.16) and from our assumption that (X(t)) is 
centered. We next need to integrate the RHS over o. Invoking Fubini’s Theorem 
to exchange the order of integration over t and o we obtain that the integral of the 
absolute value of the function defined in (25.102) is upper-bounded by 


[veto fo = atao = VRxx(0) [ Ct ME 2) ae ai 


ae Ho ere ees 
h 

ey fale a 

= Ty/Kxx (0) [lhl ; 


< OO. 


Having established that the function in (25.102) is measurable and integrable, we 
can now use Fubini’s Theorem to deduce that its integral over o is measurable as 
a mapping of (w,t), i.e., that the mapping 


se X(w,a) h(t — 0) 


(w,t) ee 


do (25.103) 


is measurable. Since the RHS of (25.103) is Y(w, t)/(1 +7), we conclude that the 
mapping (w,t) ++ Y(w,t)/(1 +?) is measurable and hence also (w,t) +> Y(w,t). 


We next prove Part (ii) using (25.91) and Proposition 6.2.5. Because h is integrable, 
its self-similarity function Rpn is integrable and of FT 


Run(f) = 


(Section 11.4). And since, by assumption, (X(t)) is of PSD Sxx, it follows that Sxx 
is integrable and that its IFT is Kxx: 


A(f)|’, FER (25.104) 


Kxx(r aie Sxx(f)e?"/7 df, TER. (25.105) 
Consequently, by Proposition 6.2.5, 
(Kxx * Ran) (7) = ry |ACA)|? Sxx(f) e®F7 df, TER. 
Combining this with (25.91) yields 


Key (7) a [ lacy? Sxx(f) ei2a fr df TE R, 
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and thus establishes that the PSD of (Y(t)) is as given in (25.92). 


We next turn to Part (iii). To establish (25.93) we use the representation (25.96) 
& (25.98) and Theorem 25.12.2: 


E[X(t)Y (t+ 7)] = Cov xe. fC X (a) 82(0) do 


= i Se Keri ode 


—co 
Co 


=a h(t +7 —a)Kxx(t—o)do 
-[- Kxx(—p) h(t — p) du 


=f Kxx(w)alr =a 

= (Kxx *h)(r), TER, 
where = o — t, and where we have used the symmetry of the autocovariance 
function. 


Finally, we prove Part (iv). The proof is a simple application of Theorem 25.12.1. 
To prove that (Y(t)) is a Gaussian process we need to show that, for every pos- 
itive integer n and for every choice of the epochs t),...,t,, the random variables 
Y(ti),...,¥ (tn) are jointly Gaussian. This follows directly from Theorem 25.12.1 
because Y(t,) can be expressed as 


Vt) =f X(o) hl, 0) do 
ay X(c)s,(a)do, v=1,...,n, 


where 
Syideh(t,—o), v=l,...,n 


are all integrable. 


The joint Gaussianity of the random variables in (25.94) can also be deduced from 
Theorem 25.12.1. Indeed, X(t,) can be trivially expressed as the functional 


x)= f X(c)8,(c)do+a,X(ty), v=l,...,n 


when s, is chosen to be the zero function and when a, is chosen as 1, and Y(t_) 
can be similarly expressed as 


rte)= | X(a)8,(a)do+ a,X(t,), v=ntl,...,n+m 


when sy: 0+ h(t, —o) and a, = 0. 
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The mathematically astute reader may have noted that, in defining the result of 
passing a WSS SP through a stable filter of impulse response h, we did not preclude 
the possibility that for every w there may be some epochs t for which the mapping 
a +> X(w,o)h(t — oc) is not integrable. So far, we have only established that 
for every epoch t the set AM; of w’s for which this mapping is not integrable is of 
probability zero. 
We next show that if h is well-behaved in the sense that it is not only integrable 
but also satisfies 
co 
/ h(t) (1 +t) dt < 00, (25.106) 
—oo 
then whenever w is outside some set NM’ C © of probability zero, the mapping 
at> X(w,o) h(t — o) is integrable for all t € R. Thus, for w’s outside this set of 
probability zero, we can think of the response of the filter as being the convolution 
of the trajectory t+ X(w,t) and the impulse response t +> A(t). For such w’s this 
convolution never blows up. 
We show this in two steps. In the first step we note that if h satisfies (25.106) and 
if the trajectory tt> X(w,t) satisfies 


2 XA(G4) 


dt < 00, (25.107) 
1+? 


—oco 


then the function g + X(w,o) h(t — oc) is integrable for every t € R (Proposi- 
tion 3.4.4). 


In the second step we show that outside a set of w’s of probability zero, all the 
trajectories t+ X(w,t) satisfy (25.107): 


Lemma 25.13.3. Let (X(t)) be a WSS measurable SP defined over the probability 
space (Q,F,P). Then 


~° X(t) 
E ; 
if ap a <x, (25.108) 
and the set 
° X72 (w,t 
we: eC ee (25.109) 
_o 1+? 


is an event of probability one. 


Proof. Since (Xx (t)) is measurable, the mapping 


X?(w, t) 


(u,t) + 


(25.110) 


is nonnegative and measurable. By Fubini’s Theorem it follows that if we define 


aa, Ga (pe 
W(w) 4 ae) a wed, (25.111) 


—0o 
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then W is a nonnegative RV taking value in the interval [0,00]. Consequently, the 
set {w € 2: W(w) < oo} is measurable. Moreover, by Fubini’s Theorem, 


E|W)] = a dt 


1+¢? 


=f FROl 


14+? 


= E[X?(0)] a inet 
= rE[X*(0)] 
< Ow. 


Thus, W is a RV taking value in the interval [0,00] and having finite expectation, 
so the event {w € 2: W(w) < co} must be of probability one. 


25.14 The PSD Revisited 


Theorem 25.13.2 describes the PSD of the output of a stable filter that is fed a 
WSS SP (X(t)). By integrating this PSD, we obtain the value at the origin of the 
autocovariance function of the filter’s output (see (25.30)). Since the latter is the 
power of the filter’s output (Corollary 25.9.3), we have: 


Theorem 25.14.1 (Wiener-Khinchin). If a measurable, centered, WSS SP (X(t)) 
of autocovariance function Kxx is passed through a stable filter of impulse response 
h: R—R, then the average power of the filter’s output is given by 


Power of X xh = (Kxx, Run) . (25.112) 
If, additionally, (X(t)) is of PSD Sxx, then this power is given by 


Power of Xena [ Sxx(f) |A(f)|? df. (25.113) 


Proof. To prove (25.112), we note that by (25.91) the autocovariance function of 
the filtered process is Kxx *Rph, which evaluates at the origin to (25.112). The 
result thus follows from Proposition 25.9.2, which shows that the power in the 
filtered process is given by its autocovariance function evaluated at the origin. 


To prove (25.113), we note that Kxx is the IFT of Sxx and that, by (11.35), 


Ran(f) = |A(f)|2, so the RHS of (25.113) is equal to the RHS of (25.112) by 
Proposition 6.2.4. 


We next show that for WSS stochastic processes, the operational PSD (Defini- 
tion 15.3.1) and the PSD (Definition 25.7.2) are equivalent. That is, a WSS SP 
has an operational PSD if, and only if, it has a PSD, and if the two exist, then they 
are equal (outside a set of frequencies of Lebesgue measure zero). Before stating 
this as a theorem, we present a lemma that will be needed in the proof. It is very 
much in the spirit of Lemma 15.3.2. 
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Lemma 25.14.2. Let g: RR be a symmetric continuous function satisfying the 
condition that for every integrable real signalh: R— R 


if; g(t) Run(t) dt = 0. (25.114) 


—oo 


Then g is the all-zero function. 


Proof. For every a > 0 consider the function 


1 
h(t) = —=I1{|t| < a/2 teR 
()= Ul <a/2}, te 
whose self-similarity function is 
t 
Run (t) = (1 - fn) I{|t] <a}, teR. (25.115) 
a 


Since h is integrable, it follows from (25.114) that 


0= i g(t) Run (t) dt 


= 2 fo g(t) Run (t) dé 
= a8 g(t) (1 = ) dt, a>0, (25.116) 
0 a 


where the second equality follows from the hypothesis that g(-) is symmetric and 
from the symmetry of Ryn, and where the third equality follows from (25.115). 
Defining 


c(t) = [ oae t>0, (25.117) 


and using integration by parts, we obtain from (25.116) that 


o-cio(-§ 


a 1 a 
+2 f G(é) dé, a>0, 
0 @Jo 
from which we obtain n 
aG(0) = i. G(gé) dé, a>0. 
0 

Differentiating with respect to a yields 

G(0) = G(a), a>0, 
which combines with (25.117) to yield 

/ g(t)dt=0, a>0. (25.118) 

0 

Differentiating with respect to a and using the continuity of g (Rudin, 1976, Chap- 


ter 6, Theorem 6.20) yields that g(a) is zero for all a > 0 and hence, by its 
symmetry, for alla € R. 
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Theorem 25.14.3 (The PSD and Operational PSD of a WSS SP). Let (X(t)) 
be a measurable, centered, WSS SP of a continuous autocovariance function Kxx. 
Let S(-) be a nonnegative, symmetric, integrable function. Then the following two 
conditions are equivalent: 


(a) Kxx is the Inverse Fourier Transform of S(-). 


(b) For every integrable h: R > R, the power in X xh is given by 


Power ofXen= [- S(f) |A(f)I? df. (25.119) 


Proof. That (a) implies (b) follows from the Wiener-Khinchin Theorem because 
(a) implies that (X(t)) is of PSD S(-). It remains to prove that (b) implies (a). 
To this end we now assume that Condition (b) is satisfied and proceed to prove 
that Kxx must then be equal to the IFT of S(-). By Theorem 25.14.1, the power 
in X xh is given by (25.112). Consequently, Condition (b) implies that 


[ scninnear = [ Kxxte) Rann) ar (25.120) 


for every integrable h: R— R. 

If h is integrable, then the FT of Rpn is the mapping f + |h(f)|? (see (11.35)). If, 
in addition, h is a real signal, then Ryn is a symmetric function, and its IFT is thus 
identical to its FT (Proposition 6.2.3 (ii)). Thus, if h is real and integrable, then 
the IFT of Rph is the mapping f + |h(f)|?. (Using the dummy variable f for the 
IFT is unusual but legitimate.) Consequently, by Proposition 6.2.4 (applied with 
the substitution of S(-) for x and of Rnn for g), 


es S(f) |A(f)|2 df = fs S(r) Ran (7 (25.121) 


By (25.120) & (25.121) and by the symmetry of S(-) (which implies that S = S) 
we obtain that 


i (S(r) — Kxx(7)) Rpn(T) dt = 0, he £;. (25.122) 


It thus follows from Lemma 15.3.2 that the mapping tT +> S(r) — Kxx(r) is the 
all-zero function, and Condition (a) is established. 


25.15 White Gaussian Noise 


The most important continuous-time SP in Digital Communications is white 
Gaussian noise, which is often used to model the additive noise in communi- 
cation systems. In this section we define this process and study its key properties. 
Our definition differs from the one in most textbooks, most notably in that we de- 
fine white Gaussian noise only with respect to some given bandwidth W. We give 
our reasons and comment on the implications in Section 25.15.2 after providing 
our definition and deriving the key results. 
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Sw (f) 


>» f 


Figure 25.3: The PSD of a SP (N(t)) which is of double-sided spectral den- 
sity No/2 with respect to the bandwidth W. 


25.15.1 Definition and Main Properties 


The parameters defining white Gaussian noise are the bandwidth W with respect 
to which the process is white and the double-sided spectral density No/2. 


Definition 25.15.1 (White Gaussian Noise). We say that (N(t)) is white Gaus- 
sian noise of double-sided spectral density No/2 with respect to the band- 
width W if (N(t)) is a measurable, stationary, centered, Gaussian SP that has a 
PSD Syn satisfying 

Swv(f)= 2, fe [-W.W). (25.123) 
An example of the PSD of white Gaussian noise of double-sided spectral den- 
sity No/2 with respect to the bandwidth W is depicted in Figure 25.3. Note that 
our definition of white Gaussian noise only specifies the PSD for frequencies f sat- 
isfying |f| < W. We leave the value of the PSD at other frequencies unspecified. 
But the PSD should, of course, be a valid PSD, i.e., it must be nonnegative, sym- 
metric, and integrable (Definition 25.7.2). Recall also that by Proposition 25.7.3 
every nonnegative, symmetric, integrable function is the PSD of some measurable 
stationary Gaussian SP." 


The following proposition summarizes the key properties of white Gaussian noise. 
The reader is encouraged to recall the definition of an integrable function that is 
bandlimited to W Hz (Definition 6.4.9); the definition of the inner product between 
two energy-limited real signals (3.1); the definition of ||s||, as \/(s,s); and the 
definition of orthonormality of the functions ¢1,...,@m (Definition 4.6.1). 


Proposition 25.15.2 (Key Properties of White Gaussian Noise). Let (N(t)) be 
white Gaussian noise of double-sided spectral density No/2 with respect to the band- 
width W. 


11As we have noted in the paragraph preceeding Definition 25.9.1, Proposition 25.7.3 can 
be strengthened to also guarantee measurability. Every nonnegative, symmetric, and integrable 
function is the PSD of some measurable, stationary, and Gaussian SP whose autocovariance 
function is continuous. 
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(i) If s is any integrable function that is bandlimited to W Hz, then 


ic N(t) s(t) dt ~ n(o. “ isi5). 


(tt) If s1,...,8m are integrable functions that are bandlimited to W Hz, then the 
m random variables 


[_xosoae iss [NO slat 


are jointly Gaussian centered random variables of covariance matriz 


(81,81) (81,82) +++ (81, 5m) 
No | (82,81) (82,82) +++ (82, 8m) 
a ee: : : 

(Sm;81) (Sm;82) -*:  (Sm,Sm) 


(itt) If b1,...,¢m are integrable functions that are bandlimited to W Hz and are 
orthonormal, then the random variables 


[Noone ar OL t) de 


are IID N(0,No/2). 


(iv) Ifs is any integrable function that is bandlimited to W Hz, and if Knn is the 
autocovariance function of (N(t)), then 


N 
Kyn *S = oe (25.124) 
(v) If s is an integrable function that is bandlimited to W Hz, then for every 
epoch te R 
N 
Cov fm N(c) s(a) do, N(t)| = => s(t). (25.125) 


Proof. Parts (i) and (iii) are special cases of Part (ii), so it suffices to prove 
Parts (ii), (iv), and (v). We begin with Part (ii). We first note that since {s,} 
are assumed to be integrable and bandlimited to W Hz, and since Note 6.4.12 
guarantees that every bandlimited integrable signal is also of finite energy, it fol- 
lows that the functions {s;} are energy-limited and the inner products (s;,s,) are 
well-defined. By (25.89) 


cov] |” N(t) s;(t) dt, *. N(t) su(ta = 7 Sw (f) 85 (f) 8¢(f) df 


ne —oo ie 
= Sun (f) 85(f) 8E(f) df 
—W 
W 
= * si(f)sk(N af 
—W 


N 3 
= > (8j,8k), VRE Lh wg}, 
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where the second equality follows because s; and sz; are bandlimited to W Hz; the 
third from (25.123); and the final equality from Parseval’s Theorem. 


To prove Part (iv), we start with the definition of the convolution and compute 


Co 


(Ky *8) (t) =| s(T) Kywn(t — 7) dr 
=f 9(r) [Suni fe? agar 
~ [. Sun (f) 8(f) P74" df 

w « 
fs / Sw (f) 8(f) e2™!* df 
—W 
No 


Ww “ 
=f (per tas 


oy 
N 
= s(t), teR, 


where the second equality follows from the definition of the PSD of (V(t) (Defini- 
tion 25.7.2); the third by Proposition 6.2.5; the fourth because s is, by assumption, 
bandlimited to W Hz (Proposition 6.4.10 cf. (c)); the fifth from our assumption 
that (N(t)) is white with respect to the bandwidth W (25.123); and the final 
equality from Proposition 6.4.10 (cf. (b)). 


Part (v) now follows from (25.85) and Part (iv). Alternatively, it can be proved 
using (25.88) and (25.123) as follows: 


Co 


Cov if N(o) s(a) do, ni] Paice Sw (f) 8(f) e274 df 


W 


- Swn (f) 8(f) e?™!* df 


—Ww 
2 J_w 
where the first equality follows from (25.88); the second because s is bandlimited 


to W Hz (Proposition 6.4.10 cf. (c)); the third from (25.123); and the last from 
Proposition 6.4.10 (cf. (b)). 


25.15.2 Other Definitions 


As we noted earlier, our definition of white Gaussian noise is different from the one 
given in most textbooks on Digital Communications. The key difference is that we 
define whiteness with respect to a certain bandwidth W, whereas most textbooks 
do not add this qualifier. Thus, while we require that the PSD Syn(f) be equal 
to No/2 only for frequencies f satisfying |f| < W (leaving Svv(f) unspecified at 
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other frequencies), other textbooks require that Syw(f) be equal to No/2 for all 
frequencies f € R. With our definition of white noise we can only prove that 
(25.124) holds for integrable signals that are bandlimited to W Hz, whereas with 
the other textbooks’ definition one could presumably derive this relationship for 
all integrable functions. 


We prefer our definition because there does not exist a Gaussian SP (N (t)) whose 
PSD is equal to Ng/2 at all frequencies. Indeed, the function of frequency that is 
equal to No/2 at all frequencies is not integrable and therefore does not qualify 
as a PSD (Definition 25.7.2). Were such a PSD to exist, we would obtain from 
(25.30) that such a process would have infinite variance and thus be neither WSS 
(Definition 25.4.2) nor Gaussian (Note 25.3.2). 


Requiring that (25.124) hold for all integrable (continuous) signals would require 
that Kyn be given by the product of No/2 and Dirac’s delta, which opens a whole 
can of worms. Nevertheless, the reader should be aware that in some books white 
noise is defined as a centered, stationary Gaussian noise whose autocovariance 
function is given by Dirac’s Delta scaled by No/2 or, equivalently, whose PSD is 
equal to No/2 at all frequencies. 


25.15.35 White Noise in Passband 


Definition 25.15.3 (White Gaussian Noise in Passband). We say that (N(t)) is 
white Gaussian noise of double-sided power spectral density No/2 with 
respect to the bandwidth W around the carrier frequency f. if (N (t)) is @ 
centered, measurable, stationary, Gaussian process that has a PSD Syn satisfying 


No 


Ww 
pe Nia Si: (25.126) 


Swn(f) = = 9° 


and if fe > W/2. 


Note 25.15.4. For white Gaussian noise with respect to the bandwidth W around 
the carrier frequency fc, all the claims of Proposition 25.15.2 hold provided that 
we replace the requirement that the functions s, {s;}, and {@,} be integrable func- 
tions that are bandlimited to W Hz with the requirement that they be integrable 
functions that are bandlimited to W Hz around the carrier frequency fe. 


25.16 Exercises 


Exercise 25.1 (Constructing a SP from a RV). Let W be a standard Gaussian RV. Define 
the continuous-time SP (X(t)) by 


X(t)=ellW, teR. 


(i) Is (X(¢)) a stationary SP? 
(ii) Is (X(t)) a Gaussian SP? 
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Exercise 25.2 (Delaying and Adding). Let (X(t)) be a stationary Gaussian SP of mean pix 
and autocovariance function Kxx. Define 


Y(t) = X(t) + X(t-tp), tER, 
where tp € R is deterministic. 
(i) Is (Y(#)) a Gaussian SP? 
(ii) Compute the mean and the autocovariance function of (Y(t)). 


(iii) Is (Y(¢)) stationary? 


Exercise 25.3 (Random Variables and Stochastic Processes). Let the random variables X 
and Y be IID N’(0,07), and let 


Z(t) = X cos(2nt) + Y sin(2rt), teER. 
(i) Is Z(0.2) Gaussian? 
(ii) Is (Z(t)) a Gaussian SP? 


(iii) Is it stationary? 


Exercise 25.4 (Stochastic Processes through Nonlinearities). 
(i) Let (X(t)) be a stationary SP and let 
YQ=9(X()), teR, 


where g: R — R is some (Borel measurable) deterministic function. Show that the 
SP (Y(t)) is stationary. Under what conditions is (Y(t)) WSS? 


(ii) Let (X(t)) be a centered stationary Gaussian SP of autocovariance function Kxx. 
Let Y(t) = sgn(X(t)), where sgn(€) is equal to +1 whenever € > 0 and is equal 
to —1 otherwise. Is (Y (t)) centered? Is it WSS? If so, what is its autocovariance 
function? 


Hint: For Part (ii) recall Exercise 23.18. 


Exercise 25.5 (WSS Stochastic Processes). Let A and B be IID random variables taking 
on the values +1 equiprobably. Define the SP (Z(t)) as 


Z(t) = Acos(2z7t) + Bsin(2zrt), teR. 


(i) Is the SP (Z(t)) WSS? 
(ii) Define the SP (W(t)) by W(t) = Z?(t). Is (W(t)) WSS? 


Exercise 25.6 (Valid Autocovariance Functions). Let Kxx and Kyy be the autocovariance 
functions of some WSS stochastic processes (X(t)) and (Y(t)). 


(i) Show that Kxx + Kyy is an autocovariance function of some WSS SP. 


(ii) Repeat for 7 +> Kxx (rT) Kyy (7). 


Exercise 25.7 (Time Reversal). Let Kxx be the autocovariance function of some WSS 
SP (X(t), t € R). Is the time-reversed SP (w,t) +» X(w,—t) WSS? If so, express its 
autocovariance function in terms of Kxx. 
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Exercise 25.8 (Classifying Stochastic Processes). Let (X(t)) and (Y(t)) be independent 
centered stationary Gaussian stochastic processes of unit variance and autocovariance 
functions Kxx and Kyy. Define the stochastic processes (S(t)), (T(t)),(U(), (V@), 
and (W(t)) at every t € R as: 


S(t) =X@+Y(ttn), TW=X()Y(t+7), 
U(t) =X()+X(t+7s), VO=X()X(t+ 7), 
W(t) = X(t) + X(-0), 


where 71, 72, 73,74 € R are deterministic. Which of these stochastic processes is Gaussian? 
Which is WSS? Which is stationary? 


Exercise 25.9 (A Linear Functional of a Gaussian SP). Let (X(t), t € R) be a measurable 
stationary Gaussian SP of mean 2 and of autocovariance function Kxx : T + exp (—|7|). 


Compute 
2 
pr| f X(t) dt > 2). 
0 


Exercise 25.10 (Two Filters). Let (X(t)) be a centered stationary Gaussian SP of auto- 
covariance function Kxx and PSD Sxx. Define 
(Y(t)) = (X(t))*hy, (Z(t) = (X() «he, 
where h,,h. € £;. Thus, (Y(¢)) is the result of passing (X(t)) through a stable filter of 
impulse response hy and similarly (Z(t)). 
(i) What is the joint distribution of Y(t1) and Z(t2) for given epochs t1, t2 € R? 


(ii) Give a necessary and sufficient condition on hy, hz, and Sxx for Y(17) to be 
independent of Z(17). 


(iii) Give a necessary and sufficient condition on hy, hz, and Sxx for (Z(t)) to be 
independent of (Y(t)). 


Exercise 25.11 (Linear Functionals of White Gaussian Noise). Find the distribution of 


Ts fore} 
N(t)dt and of / e'N(t) dt 

) ) 

when (N(t), ¢ € R) is white Gaussian noise of double-sided PSD No/2 with respect to 

the bandwidth of interest. (Ignore the fact that the mappings t + I{0 < ¢t < Ts} and 

tre‘ I{t > 0} are not bandlimited.) 


Exercise 25.12 (Approximately White SP). Let (X(t), t € R) be a measurable, centered, 
stationary, Gaussian SP of autocovariance function 


BN 
Kxx(T) = a e Sil eR, 


where No, B > 0 are given constants. Throughout this problem No is fixed. 


(i) Plot Kxx for several values of B. What does Kxx look like when B > 1? Show 
that Kxx(7) > 0 for all 7 € R; that 


i; Kxx(t) dr = ~; 


—oco 
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and that for every 6 > 0, 


5 
. No 
Jim ae Kxx(T) dt = ee 

(In this sense, Kxx approximates Dirac’s Delta scaled by No/2 when B is large.) 
(ii) Compute E[X(t)*]. Plot this as a function of B, with No held fixed. What happens 

when B > 1? 
(iii) Compute the PSD Sxx. Plot it for several values of B. What does it look like when 

BSI? 


(iv) For the orthonormal signals defined for every t € R by 


1 if0<t<1 : if0 Sts 5, 
i ; ; 
gi(t) = neee day] 1. G2 <t S41, 
0 otherwise, : 
0 otherwise 


compute E[(X, 1) (X, 2)]- What happens to this expression when B > 1? 


Chapter 26 


Detection in White Gaussian Noise 


26.1 = Introduction 


In this chapter we finally address the detection problem in continuous time. The 
setup is described in Section 26.2. The key result of this chapter is that—even 
though in this setup the observation consists of a stochastic process (i.e., a contin- 
uum of random variables)—the problem can be reduced without loss of optimality 
to a finite-dimensional problem where the observation consists of a random vec- 
tor. Before stating this result precisely in Section 26.4, we shall take a detour in 
Section 26.3 to discuss the definition of sufficient statistics when the observation 
consists of a continuous-time SP. The proof of the main result is delayed until Sec- 
tion 26.8. In Section 26.5 we analyze the conditional law of the sufficient statistic 
vector under each of the hypotheses. This analysis enables us in Section 26.6 to 
derive an optimal guessing rule and in Section 26.7 to analyze its performance. Sec- 
tion 26.9 addresses the front-end filter, which is a critical element of any practical 
implementation of the decision rule. Extensions to passband detection are then de- 
scribed in Section 26.10, followed by some examples in Section 26.11. Section 26.12 
treats the problem of detection in “colored” noise, and the chapter concludes with 
a discussion of the detection problem for mean signals that are not bandlimited. 


26.2 Setup 


A discrete random variable M (“message”) takes value in the set M = {1,...,M}, 
where M > 2, according to the a priori probabilities 


Tm = Pr[M = ml, ME M, (26.1) 
where 7,...,7mM are positive! 


T™m>0, meM (26.2) 


1There is no loss in generality in addressing the detection problem only for strictly positive 
priors. Hypotheses that have a zero prior can be ignored at the receiver without loss in optimality. 
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and sum to one 


So mes (26.3) 


meM 


The observation consists of the continuous-time SP (Y (2), te R), which, condi- 
tional on M =m, can be expressed as 


Y(t) =sm(t)+N(t), tER, (26.4) 


where the “mean signals” s;,...,S)y are real, deterministic, integrable signals that 
are bandlimited to W Hz (Definition 6.4.9), and where the “noise” (NV(t)) is inde- 
pendent of M and is white Gaussian noise of double-sided spectral density No/2 
with respect to the bandwidth W (Definition 25.15.1). Based on the observa- 
tion (Y(t)) we wish to guess M with the smallest possible probability of error.” 


26.3 Sufficient Statistics when Observing a SP 


The definition of sufficient statistics for the infinite-dimensional hypothesis testing 
problem where the observation consists of a SP is conceptually very similar to the 
definition in the finite-dimensional case where the observation consists of a random 
vector (Definition 22.2.1). But some new technical difficulties do arise. Foremost is 
that we cannot speak of the probability density function (in the usual sense) of the 
observation given each of the hypotheses.? Consequently, we need a new definition 
that does not involve such densities. 


26.3.1 Definition of Sufficient Statistics 


Loosely speaking, a sufficient statistic for guessing a RV M taking value in the 
finite set M based on an observation consisting of a SP (Y(t)) is a random vector 
T = (T,...,T()™ that satisfies two conditions. The first is that it can be 
computed from the observed SP, and the second is that—once we are given T—any 
finite number of samples 7 € N of the observations Y(t1),...,¥(t,) are irrelevant 
for guessing M. Thus, once T has been revealed to us, our optimal guess for M 
will not be improved if we are additionally given the values of (Y(t)) at any finite 
number of (deterministic) epochs. 


Recall the definition of the o-algebra generated by the SP (Y(t)) (Definition 25.2.2) 
and the definition of irrelevant data (Definition 22.5.1). 


Definition 26.3.1 (Sufficient Statistic: Observable SP). We say that the random 
vector T forms a sufficient statistic for guessing the RV M taking value in the 
finite set M based on the observed SP (Y(t) if the following two conditions hold: 


?In mathematical terms we are looking for a mapping from the set of all sample-paths of 
(Y(t)) to M that is measurable with respect to the o-algebra generated by (Y(t)) and that 
minimizes the probability of error among all such functions. 

3One could, instead, speak of the Radon-Nikodym derivative with respect to a reference 
measure, but we prefer not to pursue this approach. 
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1) T is measurable with respect to the a-algebra generated by (Y(t)). 


2) For every n € N and every choice of the epochs t1,...,tn € R, the n-tuple 
(Y(ti),...,¥(ty)) és irrelevant for guessing M based on T. 


Condition 2) is equivalent to 
M-o—T-o—(Y(t1),..., ¥(ty)) (26.5) 


forming a Markov chain for any prior on M. 


As we shall see in Section 26.4, such a sufficient statistic can always be found for 
the setup described in Section 26.2. 


26.3.2 Consequences of Sufficiency 


It would have been nice if, in analogy with Proposition 22.2.2, we could have said 
that if T forms a sufficient statistic for guessing M based on the observed SP (Y(t)), 
then the best performance in guessing M based on (Y(t)) can be achieved by a 
decision rule that bases its decision on T. This statement is almost correct, but it 
requires a qualification. 


A pathological example that demonstrates the need for a qualification is the fol- 
lowing. Suppose that M takes on the values 1 and 2 equiprobably and that Ris a 
RV that is independent of MM and that has a density. For example, R could be a 
mean-one exponential. Suppose further that, conditional on M = 1, the observed 
SP (Y(t)) is deterministically zero, and that, conditional on M = 2, the observed 
SP is zero at all times t € R except at time R when it takes on the value 1. In 
this case the conditional law of (Y(t1),...,¥(t,)) does not depend on whether the 
conditioning is on M = 1 or on M = 2. Thus, if we define the RV T to equal 17 de- 
terministically, then T’ forms a sufficient statistic for guessing M based on (Y(t)). 
The smallest probability of a guessing error based on T is 1/2.4 Nevertheless, a 
detector that guesses “M = 1” if the observed trajectory is the all-zero function 
and “M = 2” if the observed trajectory is discontinuous is correct with probability 
one. 


It is interesting to note that the latter guessing rule is not measurable with respect 
to the o-algebra generated by (Y(t)). As the next theorem demonstrates, the qual- 
ifier that we need to add is that we only consider guessing rules that are measurable 
with respect to the o-algebra generated by (Y(t)). Barring this qualifier, if T is 
sufficient, then there is no loss in optimality in basing our guess on T only. 


Theorem 26.3.2. Consider the multi-hypothesis testing problem of guessing a RV M 
taking value in the set M = {1,...,M} based on an observation consisting of a 
SP (Y (2), te R). Let T be a random vector that forms a sufficient statistic for 
guessing M based on (Y(t)). Then no decision rule that is measurable with respect 
to the a-algebra generated by (Y (2) can have a lower probability of error than an 
optimal rule for guessing M based on T. 


“This is also the smallest probability of error in guessing M based on (Y(t1),...,¥(tn)), 
irrespective of the (finite) value of the positive integer 7 and of the (deterministic) choice of the 
epochs t1,..., ty. 
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Proof. Let ¢(-) be any decision rule that is measurable with respect to the o- 
algebra generated by (Y(t)), i-e., a decision rule whose disjoint decision sets 


Dm =o '(m), meM 


are all measurable with respect to this o-algebra. The conditional probability that 
the rule ¢(-) guesses correctly is 


Pr(4(-) is correct | M =m) = Pr[(Y(t))€Dnm|M=m], meM. (26.6) 


We shall show that ¢(-) can be approximated by a decision rule (-) that bases its 
decision on a finite number of samples Y(ti),...,¥(t,)), where 7 € N and where 
t1,...,t, € R are deterministic epochs. The approximation is in the sense that, 
conditional on each m € M, the probability of success of ¢(-) is within € of that 
of ¢(-). We shall then show that the best decision rule based on T is at least as 
good as (-) and is thus also within ¢ of (-). Since these steps will be performed 
for an arbitrary € > 0, and since the performance of the best decoder based on T 
does not depend on e¢, this will demonstrate that ¢(-) is no better than the best 
decision rule based on T. And since ¢(-) here is an arbitrary measurable decision 
rule, it will follow that no measurable decision rule can outperform an optimal rule 
based on T and the theorem will be proved. 


To follow this outline we first need some basic set-theoretic notation. Given two 
sets A and B we denote by A \ B the set consisting of those elements of A that 
are not in B. We denote by AAB the symmetric set difference between A and B 
consisting of those elements that are in one of the sets but not in the other. Thus, 


AA B= (A\B)U(B\ A)? 


A standard result from Measure Theory (Halmos, 1950, Exercise (8), Section 14) 
guarantees that for every € > 0 there exist epochs t1,...,t, € IR and sets Di; 
(not necessarily disjoint) that are all measurable with respect to the o-algebra 
generated by Y(t,),..., Y(t) and such that 


Pr[(¥(t)) € Div A Dm 


M=mn]|< m,m' eM. (26.7) 


€ 

M’ 

Define now the disjoint sets oe lag Dy inductively by defining Pi = Di and 
Dm = Dm \ Drnry WS {2532 Mi). (26.8) 


By construction, these sets are disjoint. And because Di ioe Dm are measurable 
with respect to the o-algebra generated by Y(t1),...,Y(t,), so are Di,...,Dm. 


5 As an aside we mention that the indicator functions of the sets A, B and AAB are related 
via the relation 


I{x € AA B} = 1{x € A} Ol{z € B}, 
where ® denotes exclusive-or, i.e., mod-2 addition (060 = 0,061 =1,160=1, and1@1=0). 
This relationship simplifies the proof of some of the key properties of the symmetric set difference, 
especially when combined with the analogous relation for intersection 


I{z € ANB} =l{x € A} l{z € B}, 


where on the RHS of the above we use mod-2 multiplication. 
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We next consider a decoder ¢(-) that guesses “MM = m” whenever the sample-path 
of (Y(t)) is in the set D,,. (If the sample-path of (Y(t)) does not fall in any of the 
sets {D,,}, then the decoder produces an error flag.) This decoder bases its guess 
only on Y(¢,),...,¥ (tn) and yet, as we shall next show, succeeds with probability 
that is at least within ¢ of the probability of success of ¢(-), i-e., 

Pr(d(-) is correct | M =m) > Pr(¢(-) is correct | M=m)-« méeM. (26.9) 
This will imply, in particular, that when averaged over MW 

Pr(d(-) is correct) > Pr(¢(-) is correct) — €, (26.10) 

irrespective of the prior on M. But by (26.5) an optimal decision rule based on T 
is at least as good as ¢(-) and is thus also within € of #(-).° Since e is arbitrary, it 
follows that an optimal decision rule based on T is at least as good as ¢(-), thus 
proving the theorem. 


To complete the proof it thus remains to prove (26.9). To that end we note that, 
since for any sets A and 6 we have AD BN A, it follows that D,, D Dm ADm and 
hence, by (26.8), 


2 (Pm Bm) \ LJ (Pm U (Bm \Pm’)) (26.11) 


= (Pn Brn) \ ( U (Bw \ Pn) ) (26.12) 


= Dm \ (@n \ Dn) Ly rm \Dm)). (26.13) 
mi<m 
where (26.11) follows because D,, U (Dr! \ Dit) = Dm U DD D2 and be- 
cause, by construction, Dm contains Dy, (see (26.8)); where the equality (26.12) 
follows because the sets {D,,} are disjoint; and where the other equalities follow 
by standard set-theoretic identities. It follows from (26.13) that 


De ED Wi. Dig Da) O (Dae Pia) (26.14) 


m/<m 


8 An optimal decision rule based on T is the Maximum A Posteriori rule that computes 
the conditional distribution of M given T. But the Markov condition (26.5) implies that the 
conditional law of M given T is the same as the conditional law given T & (Y(t1),...,¥(tn)), 
so an optimal decision rule based on T is as good as an optimal decision rule given T and 


(Y (t1),..., ¥(t,)) and is therefore at least as good as ¢(-), which is based on (Y(t1),..., Y(t)) 
alone. 
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because for arbitrary sets A,6,C, the relation A > B\C implies that B C AUC. 
From (26.14) and the Union-of-Events Bound (Theorem 21.5.1) we obtain 
Pr(¢(-) is correct | M =m) 
= Pr[(Y(t)) € Dm | M =m] 
< Pr{(Y(t)) € Dn | M =m] + Pr[(Y(t)) € (Din \ Dm) | M = ml] 


+ S° Pr[(¥(t)) € Dm \ Dm | M = ml] 
<Pr[(¥(t)) €Dm|M =m] + S$ Pr[(¥(é)) € Dv A Dm | M = ml 


< Pr(d(-) is correct | M = m) + may 


< Pr(d(-) is correct | M =m) +e, meM, 


where the first inequality follows from (26.14) and the Union-of-Events bound; the 
second inequality follows because for any two sets A and B we have A\BC AAB 
and also B\ A C AAB,; the third inequality from (26.7); and the final inequality 
because m € {1,...,M}. This concludes the proof by establishing (26.9). 


26.4 Main Result 


The main result of this chapter is Theorem 26.4.1, which provides a sufficient 
statistic for the setup of Section 26.2. A more general version (Theorem 27.3.1) 
will be proved in Chapter 27. Nevertheless, we have chosen to provide a separate 
proof of Theorem 26.4.1 in Section 26.8 because the proof of this case is simpler. 


Theorem 26.4.1 (Inner Products with the Mean Signals Suffice). In the setup 
of Section 26.2, the random vector 


(3B Y(t) si(t) dif Y(t) sm(t) av) (26.15) 


forms a sufficient statistic for guessing M based on (Y(t)). 


Proof. See Section 26.8. 


Because the RV [ Y(t) 5m(t) dt can be viewed as a mapping that maps each w € 2 
to the inner product between its trajectory t+ Y(w,t) and the signal t + s,,(t), 
we denote this random variable by (Y,s),)." With this notation, the main result 
is that the M inner products 


(Y,s1),...,(¥,sm) (26.16) 


form a sufficient statistic for guessing M based on (Y(t)). 


7Here, as throughout, (Q,F, P) denotes the probability space over which all the random 
variables and stochastic processes in the setup are defined. 
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Figure 26.1: Computing the inner product between the observed SP and each of 
the mean signals and then basing one’s decision on these inner products. 


This theorem is extremely useful because in combination with Theorem 26.3.2 it 
demonstrates that, without loss of optimality, we can limit ourselves to guessing 
rules that use the observation to compute the M inner products (26.16) and that 
then base their decision on these inner products. Figure 26.1 illustrates such de- 
cision rules. The theorem thus helps us to convert the guessing problem from one 
with a continuous-time observation (Y(t)) to a problem of the kind we addressed 
in Section 21.3, where the observable is a finite-dimensional random vector (the 
inner products vector, which takes value in in R™)., 


We can generalize Theorem 26.4.1 using the linearity of the stochastic integral 
(Lemma 25.10.3). This generalization allows us to further reduce the dimension of 
the sufficient statistic vector from the number of messages M to the dimension d 


of the linear subspace span(sj,...,S)) spanned by the mean signals s1,...,S: 
d = Dim(span(sj,...,8m))- (26.17) 

Corollary 26.4.2. Let 81,...,8, be integrable signals that are bandlimited to W 

Hz.° If every mean signal can be written as a linear combination of (81,-..,8n), 


then the random n-vector 
Co Co T 
(/ Y(t) 81(t) dt,... of Y(t) &,(t) ar) (26.18) 
forms a sufficient statistic for guessing M based on (Y(t)). 


Proof of the corollary. By the corollary’s hypothesis, every mean signal s,, can be 
written as a linear combination of the signals {8;}'_,. Thus, to each m € M there 


correspond n coefficients (not necessarily unique) af) fer a? ) ER such that 
n 
Bm = ) oy) Sy. (26.19) 
j=l 


8The result also holds if the signals are not bandlimited, but we prefer to assume that they 
are. 


26.5 Analyzing the Sufficient Statistic 569 


Consequently, by the linearity of integration (Lemma 25.10.3), we can compute the 
integrals appearing in (26.15) from the random n-vector (26.18) using the relation 


ie Y(t) 8m(t) dt = Sal? /- Y(t) 3;(t)dt, meM. 


From the vector in (26.18) we can thus compute the vector in (26.15), and since 
the latter forms a sufficient statistic (Theorem 26.4.1) it follows that the former 
must also form a sufficient statistic (Proposition 22.4.2).° 


We note that Corollary 26.4.2 does, indeed, generalize the theorem because, by 
choosing n = M with s,, = s,, for all m € M, we recover the theorem from 
the corollary. More interesting is the case where (S1,...,8,) forms a basis for 
span(si,...,Sm). In this case the corollary provides a sufficient statistic consisting 
of a random d-vector, where d is the dimension of span(si,...,8m). This reduces 
the number of inner products needed to implement the receiver from M to d. As 
we shall see, it is particularly convenient to choose (8),...,8,) as an orthonormal 
basis for span(s;,...,8m). In this case we shall prefer to refer to {8;} as {@e}¢_,, 
where, as before, d is the dimension of span(sj,...,Sm). 


26.5 Analyzing the Sufficient Statistic 


26.5.1 The Conditional Law of the Sufficient Statistic 


Having reduced the guessing problem from one where the observation is a SP to 
one where it is a random vector, we can proceed to derive an optimal decision rule 
based on this vector. To derive such a rule we need the conditional distribution 
of this vector conditional on each of the hypotheses. Fortunately, this is easy 
for the problem at hand, because the Gaussianity of the noise (N(t)) implies 
that, conditional on each of the hypotheses, the vectors in (26.15) and (26.18) 
are Gaussian (Theorem 25.12.1). Their conditional distributions are thus fully 
specified by their mean vectors and covariance matrices. 


The calculation of the mean vectors is straightforward. Indeed, by linearity and 
by Proposition 25.10.1, 


elf Y(t) sj(0.ae] a = m| = El (Sm(¢) + N(é)) «(0 


—Co 


atiet 34 ae if N(t) 8;(t) a 
= (8m,8j),  j,meM., 


Thus, for every m € M, the conditional mean of the vector in (26.15), conditional 
on M =m, is the vector 


((Sm;81),---;(Sm8M)) - (26.20) 


°For the pedantic reader one should add that, by Proposition 25.10.1, the vector in (26.18) 
is measurable with respect to the o-algebra generated by (Y(t). 
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The calculation of the conditional covariance matrices requires a simple application 
of Proposition 25.15.2. It yields that the covariance matrix of the vector in (26.15), 
conditional on M =m, is given by the M x M matrix 


(81,81) (81,82) +++ (S1,8m) 
“ (S2, 81) (S2,S2) +: (S2,Sm) (26.21) 
(Sm;81) (Sm,S2) °°: (Sm,Sm) 


Note that the conditional covariance matrix does not depend on the hypothesis m 
on which we are conditioning, because this hypothesis only influences the mean 
of (Y(t). 

More generally, for the sufficient statistic vector in (26.18) we obtain that for every 
m € M the conditional distribution of this vector, conditional on M = m, is 
Gaussian with the n-dimensional mean vector 


((Sm,81);-++; (Sms Sn)) | (26.22) 


and the n x n covariance matrix 


(1,81) 61,52) +++ G1,8n) 
Ns (82,81) (82,82) +++ (82,8n) (26.23) 
(8n,81) (8ns82) +++ (Bn 8n) 


(The assumption that the signals {s;} are bandlimited to W Hz is not needed in 
Corollary 26.4.2, but it is needed for the above conditional law to hold.) 


26.5.2 It Is all in the Geometry! 


It is interesting to note that the conditional mean vector in (26.20) and the condi- 
tional covariance matrix in (26.21) are fully determined by No/2 and by the inner 
products 

{(Sm’, Sm) }m!m/EM3 (26.24) 
the PSD of the noise (N(t)) outside the band f € [—W, W] is immaterial. Similarly, 
except in determining the pairwise inner products, the exact waveforms of the mean 
signals are immaterial. Since the conditional distribution of the sufficient statistic 
vector (26.15) is Gaussian, and since the distribution of a Gaussian vector is fully 
determined by its mean vector and its covariance matrices (Theorem 23.6.7), we 
can conclude: 
Note 26.5.1. The conditional distribution of the sufficient statistic vector (26.15) 
given each of the hypotheses is determined by No and by the inner products in 
(26.24). The PSD of the noise at frequencies outside the band [—W, W] is imma- 
terial. 


Note, however, that the calculation of the sufficient statistic from the observa- 
tion (Y(t)) requires more than just knowledge of the inner products in (26.24); the 
calculation of the vector (26.15) requires knowledge of the waveforms s1,...,Sm. 
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Since an optimal decision rule for guessing M based on (Y(t) can be based on the 
sufficient statistic (Theorem 26.3.2), and since the conditional distribution of the 
sufficient statistic given each of the hypotheses depends only on No and the inner 
products (26.24), it follows that: 


Proposition 26.5.2. For the setup of Section 26.2, the minimal probability of error 
that can be achieved in guessing M based on (Y (t)) is determined by No, by the 
inner products (26.24), and by the prior {t%m}mem- 


26.5.3. Orthonormal Bases 


The conditional distribution of the sufficient statistic given each of the hypotheses 
is easier to manipulate if we choose the functions {8;} in (26.18) to form an or- 
thonormal basis for the linear subspace spanned by the mean signals. In this case 
we denote the basis functions by @1,...,@q so 


span(@1,...,@a) = span(si,...,Sm), (26.25a) 

(de, de) = I{l’ = e"7%, je He Ee {1, Sen's ,d}, (26.25b) 

where d is the dimension of the linear subspace spanned by the mean signals (26.17). 
Such functions @1,...,@q can be found, for example, using the Gram-Schmidt 


procedure (Section 4.6.6).'° We denote the sufficient statistic vector (26.18) by 
fa OF ne 


T! = ‘ie Y (t) ¢)(t) dt 


Figure 26.2 depicts a block diagram of a circuit that computes the inner products 
of the received waveform with each of the basis signals. 


By (26.22) and (26.23) we obtain that for every m € M the conditional distribution 
of T given that M = m is Gaussian with mean 


E[T | M =m] = ((sm,1),---, (Sma) " (26.27) 


and covariance matrix (No/2) lg, where Iq denotes the d x d identity matrix. The 
components of T are thus conditionally independent and of equal variance No/2 
(but not of equal mean). Consequently, we can express the conditional density 
of T, conditional on M = m, at every point t = (¢,...,t©)™ € R? using this 
conditional independence and the explicit form of the univariate Gaussian density 
(19.6) as 


: (© _(g 2 
frjm=m(t) = II 1! exp ( (t ( ms Pe)) 


2N/2 


d 


~ ene as ( <: She (5m. 60))”), teR*. (26.28) 


l=1 


10Since the mean signals are bandlimited, the only zero-energy element of span(si1,..-,Sm) 
is the all-zero signal (Note 6.4.2). Consequently, span(si1,...,8.) has an orthonormal basis 
(Proposition 4.6.10), which can be found using the Gram-Schmidt procedure (Section 4.6.6). 
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Figure 26.2: Computing the inner products T( = (Y,@¢) for €=1,...,d from 
the received waveform. 


Note that with proper translation (Table 26.1) the conditional distribution of T is 
very similar to the one we addressed in Section 21.6; see (21.50). In fact, it is a 
special case of the distribution studied in Section 21.6: Y there corresponds to T 
here; J there corresponds to d here; a? there corresponds to No/2 here; and the 
mean vector s,, associated with Message m there corresponds to the vector 


((SmP1))+ ++; (Sm, ba)" (26.29) 


here. Consequently, we can use the results from Section 21.6 and, more specifically, 
Proposition 21.6.1, to derive an optimal decision rule for guessing M based on T. 
We adopt this approach when we next derive an optimal decision rule for our setup. 


In Section 21.6 Here 
number of components of j d 
observed vector 
variance of noise added to 
eet ee : o No/2 
number of hypotheses M M 


conditional mean of observa- ( (1) 
tion given M =m Pane 


sum of squared components 57 (0)? 3 ((8m: enn -[* 2) at 


d 
of mean vector 
j=l =1 a OP, 


Table 26.1: The setup in Section 21.6 and here. 


“ar (Sins@i), var eles ay) 


26.6 Optimal Guessing Rule 


We are finally ready to derive an optimal guessing rule for our setup. Recall that, by 
Corollary 26.4.2, if (@1,...,@a) is an orthonormal basis for the linear space spanned 
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by the mean signals, then the random vector T defined in (26.26) forms a sufficient 
statistic for guessing M based on (Y(t). Consequently, by Theorem 26.3.2, it is 
optimal to use the observation (Y(t)) to compute the vector T and to then use the 
MAP rule to guess M based on T (Theorem 21.3.3). We shall do just that. We 
first present the resulting rule in terms of the orthonormal basis (@1,...,@a) and 
then show that the rule does not depend on the specific choice of the orthonormal 
basis. 


In deriving the decision rule we shall repeatedly use the fact that if (@1,...,@a) is 


an orthonormal basis for span(s,,...,S), then, by Proposition 4.6.4, 
d 
Sm = Se (Gee de) be, ME M, (26.30) 
f=1 


and, by Proposition 4.6.9, 


d 


lIsmll3 = ¥_(Sm¢e)?, me M. (26.31) 


f=1 


26.6.1 The Decision Rule in Terms of (@1,..., 6) 


As we have noted, the conditional density f).—m(-) in (26.28) is of the form we 
discussed in Section 21.6 (Table 26.1). By Proposition 21.6.1 we thus obtain: 


Theorem 26.6.1. Let M, si,...,sm, and (Y (t)) be as in our setup, and let the 
d-tuple (di,...,@a) be an orthonormal basis for span(s1,...,SmM). 


(i) The decision rule that guesses uniformly at random from among all the mes- 
sages m € M for which 


es ((Y, be) — (sm, oe) 


In zz, 


No 
d 2 
= max {tm doen ((Y, be) — (Sim's be) ! (26.32) 
m'EM No 


minimizes the probability of a guessing error. 


(ii) If M has a uniform distribution, then this rule does not depend on the value 
of No. It chooses uniformly at random from among all the messages m € M 
for which 


d d 
S-((¥, 1) = (Sins be)” = amin, {306.40 enn df. (26.33) 


l=1 Al 


(itt) If M has a uniform distribution and, in addition, the mean signals are of 
equal energy, 1.€., 
IIsillo = Ils2lle =---=IIsmll,; (26.34) 
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then these decision rules are equivalent to the maximum-correlation rule that 
guesses uniformly from among all the messages m © M for which 


d d 
» Sin, Pe)(Y ae Sm’, be) (Y , he). (26.35) 
é=1 t=1 


Proof. The theorem follows directly from Proposition 21.6.1. For Part (iii) we 
need to note that, by (26.31), Condition (26.34) is equivalent to the condition 


d 


d d 
S" (si, 60)? = S_ (82, bc)? =--- = So(sm, be)”, (26.36) 
t=1 t=1 


l=1 


which is the condition needed in Proposition 21.6.1. 


Note that, because (@1,...,@a) is an orthonormal basis for span(s1,...,Sm), the 
signals s,," and S,,” differ, if, and only if, the vectors ((Sm,1),..-, (Sm, @a))! 
and ((Sm,01),--.;(Sm,@a))' in R¢ differ. Consequently, by Proposition 21.6.2: 


Note 26.6.2. If the mean signals s,,...,Smm are distinct, then the probability of a 
tie, ie., that more than one message m € M satisfies (26.32), is zero. 


26.6.2 The Decision Rule without Reference to a Basis 


We next derive a representation of our decision rule without reference to a specific 
orthonormal basis. 


Theorem 26.6.3. Consider the problem of guessing M based on (Y(t) in our 
setup. 


(i) The decision rule that guesses uniformly at random from among all the mes- 
sages m€ M for which 


Intm + nt ce Y(t) 8m(t) dt — sf. st) ar) 
= max {ln + a cx Y(t) 8m/(t) dt — sf. s?_,(#) at) (26.37) 


minimizes the probability of error. 


(tt) If M has a uniform distribution, then this rule does not depend on the value 
of No. It chooses uniformly at random from among all the messages m © M 


for which 
a Y(t) 8(t) dt — =f. s2 (t) dt 


= max ve Y (t) 8m/(t) dt — [8 2 A(t yar} (26.38) 
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(itt) If M has a uniform distribution and, in addition, the mean signals are of 
equal energy, 1.e., 
IIsilla = Ilsallz =--- = IIsmillo > 
then these decision rules are equivalent to the mazimum-correlation rule that 
guesses uniformly from among all the messages m € M for which 


Ts Y(t) sm(t)dt= max ae Y(t t)sm(tath (26.39) 


Proof. We shall prove Part (i) using Theorem 26.6.1 (i). To this end we begin by 
noting that 
d 2 
Ditai ((¥, be) — (Sm’, be) 
No 


ln Tm 


can be expressed by opening the square as 


a ee ae Tee 
Intm — Ne LY, ge)? + No LY, be) (Sm'; ge) - No SE Gis gr). 


€=1 


Since the term 


does not depend on the hypothesis, it is optimal to choose a message at random 
from among all the message m satisfying 


d 


d 
Int, + — Dm ee de) (Sin, Pe) — a Sm, Pe)” 


0 gai 
Bae i 
= ] m! = Y mm! 5 en m's 2 ° 
29m, {Ine + DOE Bowe) ~ Slow "| 


Part (i) of the theorem now follows from this rule using (26.31) and by noting that 


d 
Si(Y, be) (Sms Ge) = (y, Ss (Sm, e) or) 
l=1 


(=1 
=(Y,smn), mem, 
where the first equality follows by linearity (Lemma 25.10.3) and the second from 
(26.30). 


Part (ii) follows by noting that if M is uniform, then In7z,,, does not depend on the 
hypothesis m. 


Part (iii) follows from Part (ii) because if all the mean signals are of equal energy, 
then the term 


does not depend on the hypothesis. 
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By Note 26.6.2 we have: 
Note 26.6.4. If the mean signals are distinct, then the probability of a tie is zero. 


26.7 Performance Analysis 


The decision rule we derived in Section 26.6.1 uses the observed SP (Y(t)) to 
compute the vector T of inner products with an orthonormal basis (@1,..., a) 
via (26.26), with the result that the vector T has the conditional law specified 
in (26.28). Our decision rule then performs MAP decoding of M based on T. 
Consequently, the performance of our decoding rule is identical to the performance 
of the MAP rule for guessing M based on a vector T having the conditional law 
(26.28). The performance of this latter decoding rule was studied in Section 21.6. 
All that remains is to translate the results from that section in order to obtain 
performance bounds on our decoder. 


To translate the results from Section 21.6 we need to substitute No/2 for a? there; 
d for J there; and (26.29) for the mean vectors there. But there is one more 
translation we need: the bounds in Section 21.6 are expressed in terms of the 
Euclidean distance between the mean vectors, and here we prefer to express the 
bounds in terms of the distance between the mean signals. Fortunately, as we next 


show, the translation is straightforward. Because (@1,...,@a) is an orthonormal 
basis for span(si,...,Sm), it follows from Proposition 4.6.9 that 
d 
S— (v, be)” = IIvll3, Vv € span(si,-..,sm). (26.40) 
f=1 


Substituting s,,./ —S,,” for v in this identity yields 


d 


So ((Smis de) — (Sm, dey) — || Sm = Sm ||5 


é=1 
os 2 
= / (Sm/(t) — 8m(t))” dt, 

—co 
where we have also used the fact that for v = S;,7 — Sj” we have, by the linearity 
of the inner product in its left argument, (v, ec) = (Sm, be) — (Sm, be). Thus, the 
squared Euclidean distance between two mean vectors in Section 21.6 is equal to 
the energy in the difference between the corresponding mean signals in our setup. 
Denoting by pwap(error|M = m) the conditional probability of error of our decoder 
conditional on M = m, and denoting by p*(error) its unconditioned probability of 
error (which is the optimal probability of error) 


p* (error) = S- Tm PMAP(error|M = m), (26.41) 
meM 


we obtain from (21.57) 


pmap(error|M =m) < S- a) 


m/Az~m 


= Sm! No/2 
( Sm ll oe o/ te Tm 
V 2No I|Sun my Sm’ lo Tm! 
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and hence by, (26.41), 


on il No /2 i 
(error) < So Tm S- o( Is Re ley io ln us 


meM m'Az~m 


). (26.43) 


Sm’ | 2 Tm! 


When M is uniform these bounds simplify to 


pmap(error|M = ap Sy Q [m= smell , M uniform (26.44) 
m'ém No 
and 
p* (error) < Z Se 9) [sm = Sm'llp M uniform (26.45) 
aut M er Bi 2No ? «. . 


Similarly, we can use the results from Section 21.6 to lower-bound the probability 
of a guessing error. Indeed, using (21.63) we obtain 


(26.46) 


\|Sm — Sm’|lo No/2 Tm 
pmMap(error|M =m) > max o( + In 
( | ) m'Az~m 2No \|Sun — Sm’ | 2 Teal 


: Sm — Sm’|lo No/2 Tm 
p* (error) > y Tm se: < O + In . (26.47) 
meM V2No Sm —Sm/|lp Tm’ 


For a uniform prior these bounds simplify to 


I|Sm — Sm’ lo 


pmap(error|M =m) > inex Q M uniform, (26.48) 


2No 
1 [Sm = Sm’ IF 
p (error) > — max Q |\/—“~——" te M uniform. (26.49) 
M ea m!' Am 2No 


26.8 Proof of Theorem 26.4.1 


26.8.1 A Lemma 


We begin with a lemma regarding sufficient statistics in testing whether a random 
vector Y was drawn NV (p, A) or (=p, A). 


Lemma 26.8.1. Let H be a binary RV, and let the random vector Y be N(p, A) 
conditional on H = 0 and N(—p, A) conditional on H = 1. If pis a scalar multiple 
of the last column of A, then the last component of Y forms a sufficient statistic 
for guessing H based on Y. 
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Proof. Let n denote the number of components of the vectors Y and p, so A 
is n x n. To show that Y) is a sufficient statistic we shall calculate the log 
likelihood-ratio function and then show that it is computable from Y. This 
approach, while straightforward, does not prove the lemma in its fullest generality 
because it only covers the case where Y has a density, i.e., when the covariance 
matrix A is nonsingular. Referring the reader to Section 26.14 on Page 606 for a 
somewhat less intuitive proof that covers all cases, we proceed here to address the 
case where A is nonsingular. 


The condition that yw is a scalar multiple of the last column of A is equivalent to 
the existence of some a € R such that 


re heal ie (26.50) 


When A is nonsingular we can use the explicit form of the density of the multivariate 
Gaussian distribution (23.56) to express the log likelihood-ratio as 


e-3(y—w) A“! (y-n) 


ce ee, 
iii fy|H=0(y) =i s/(27)” det A 


fy|H=1(Y) onntn e~a(ytH)TA-1 (y+) 
= Sy t TAY +) — Sy — wT — Hy) 
as y Ap a p'A-ty 
= 2yTA* (26.51) 
= 2y'A~!A(0,...,0,a)" (26.52) 
=2ay™, yeER", (26.53) 


where (26.51) holds because the scalar yz'A~ly is equal to its transpose, and the 
latter—by the transposition law (AB)' = B'A'—is given by y'(A7+)" ws, which by 
the symmetry of A (and hence also of its inverse), is equal to y'A~!y; and where 
(26.52) follows from (26.50). 


It follows from (26.53) that the likelihood-ratio is computable from the last com- 
ponent of Y, thus establishing that this component forms a sufficient statistic 
(Definition 20.12.2). 


26.8.2 The Binary Antipodal Case 


We begin the proof of Theorem 26.4.1 by considering the special case of binary 
hypothesis testing (M = 2) where the mean signals are antipodal to each other, 
i.e., when their sum is the all-zero signal. Since we are now treating the binary 
hypothesis testing setting, we denote the RV we wish to guess by H and assume 
that it takes value in the set {0,1}. We denote the mean signal corresponding to 
H =0 by s, so the mean signal corresponding to H = 1 is —s. We assume that s 
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is an integrable signal that is bandlimited to W Hz. Conditional on H = 0, the 
received signal (Y(t)) is given at each t € R by s(t) + N(t), where (N(t)) is white 
Gaussian noise of PSD No/2 with respect to the bandwidth W. Conditional on 
H =1 the time-t received signal is —s(t) + N(t). 

Recall from Definition 26.3.1 that to show that (Y,s) forms a sufficient statistic 
for guessing H based on the observation (Y(t) we need to show that for every 
positive integer 7 and every choice of the epochs t),...,t, € R the RV (Y,s) 
forms a sufficient statistic for guessing H based on the observation consisting of 


the random vector! i 
(Vil ora Y (iy i ¥o8)) = (26.54) 


This we prove by showing that this vector satisfies the assumptions of Lemma 26.8.1. 


Denoting the conditional mean of this vector, conditional on H = 0, by ps, we have 


ps = (s(t), .--,8(ty), IIsll2)", (26.55) 
because 
E[Y (t,) | H = 0] = Els(t,) + N(¢,) | H =0] 
= s(t,) + E[N(t,)] 
= $(t)), VE Tys2c5} 
and 


E[(Y,s) | H =0] =E[(s+N,s)] 
= |Is\l2 + EL(N, s)] 
= IIsli2 


(Theorem 25.12.2). The conditional covariance matrix A of the vector in (26.54) 
conditional on H = 0 is given by the (7 + 1) x (7+ 1) matrix 


Kwn (0) Kyn (ti —te) .-. Kyn(ti-—ty) 8(ti)No/2 
Kw (to — ty) Ky (0) eee Kw (to — ty) 8(t2)No/2 

A= ~ ee is hie a (26.56) 
Kinet! Kenta —te) a.., “Kes <3G)Ng 
s(t1)No/2 s(tz)No/2 ... 8(t»)No/2__|Is|| No/2 


(Proposition 25.15.2). Conditional on H = 1, the mean of the vector in (26.54) 
is —y and the covariance matrix is also A. From Proposition 25.11.1 regarding 
linear functionals of Gaussian stochastic processes, it follows that, conditional on 
H = 0, the vector in (26.54) is Gaussian. Likewise conditional on H = 1. And by 
(26.55) & (26.56) the mean vector ps is equal to the last column of the covariance 
matrix (26.56) scaled by 2/No. 


'1The measurability of (Y,s) with respect to the o-algebra generated by (Y(t)) follows from 
Proposition 25.10.1. 
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Having established that the random vector (26.54) satisfies the hypotheses of 
Lemma 26.8.1, we can infer that its last component, namely (Y,s), forms a suf- 
ficient statistic for guessing H based on the vector in (26.54). Since 7 € N and 
t1,...,t,) € R are here arbitrary, this proves that (Y,s) is sufficient for guessing H 
based on (Y(t)), thus proving Theorem 26.4.1 for the two-hypotheses case with 
antipodal mean signals. 


26.8.3 The General Binary Case 


We next prove Theorem 26.4.1 in the more general binary hypothesis testing setting 
where the mean signals are not necessarily antipodal. We denote the mean signals 
corresponding to H = 0 and H = 1 by So and sj, and we assume that both are 
integrable signals that are bandlimited to W Hz. We need to show that the vector 


((Y,s0),(Y,s:))" (26.57) 
forms a sufficient statistic. 


Before giving a formal proof, we provide some intuition. Based on the observa- 
tion (Y(t)), the receiver can compute the waveform 


so(t) + si(t) 


Y(t) = Y(t) 5 , teR. 


Since the transformation from (Y(t)) to (Y(t)) is reversible, there is no loss in 
optimality in basing one’s decision on (Y(t)). Conditional on H = 0 the SP (Y(t)) 
is of the form 

Ci so(t) + s1(t) 


Z So(t) 5 s1(t) +N(t), teR, 


whereas conditional on H = 1 it is of the form 
~ t t 
Y(t) = 51(t) + N(t) _ so(t) + s1(¢) 
@_/_—-—_“— 2 
Y(t) 
_ S(t) e= $1(t) 


= NW), tER 


Consequently, the problem of guessing H based on (Y(t)) is the antipodal problem 
we addressed before with the received waveform being (Y(t)) and with the mean 
signals corresponding to H = 0 and H = 1 being (so — s81)/2 and —(so — s1)/2 
respectively. From our treatment of the antipodal case, we know that for this 


problem (Y, (so — $;)/2) forms a sufficient statistic. This sufficient statistic can 
be written more explicitly as 


~ So—- S81 So+S1 So—-S} 
Y =(Y 
( , 2 ) ( 2 ; 2 ) 


= 5 (¥, 80) — 5 (¥,81) — 7(Ilsol3 — llsil3), 
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thus demonstrating that this sufficient statistic is computable from the vector 
in (26.57).1? 


For readers who prefer a more formal proof we offer the following. Define 


So — 81 


; (26.58) 


Since (Y — (Sp + $1)/2,8) is computable from the vector in (26.57), it follows from 
Proposition 22.4.2 that to prove that the vector in (26.57) is sufficient it is enough to 
establish that (Y — (so + s,)/2,s) is sufficient. We thus need to show that for every 
n € N and for every choice of the epochs t1,...,t, € R, the RV (Y — (so + 81)/2,8) 
forms a sufficient statistic for the hypothesis testing problem of guessing H based 
on Y(t1),...,¥ (ty), (Y — (80 + 81)/2,8), ie., that for every prior on H, 


H--(Y = 5) (Y@)on¥G)): 


Equivalently, since subtracting deterministic quantities does not alter conditional 
independence, it suffices to show that 


H (y oF 5 5) (vin) ee 5) satya) 


This can be proved by applying Lemma 26.8.1 to the vector 


(vin) mthtale) va, si), Fy wn gy 


which, conditional on H = 0, is Gaussian, with the covariance matrix in (26.56) 
and with the mean vector being the RHS of (26.55) (with s defined in (26.58)) and 
which, conditional on H = 1, is Gaussian with the same covariance matrix (26.56) 
but with the conditional mean being antipodal to the RHS of (26.55). 


26.8.4 The General Case 


We now prove the general (not necessarily binary) case of Theorem 26.4.1. There 
is surprisingly little left to do. The key is Proposition 22.3.2, which demonstrates 
that if a function of the observation is sufficient for testing between any two of the 
hypotheses, then it is sufficient for the multi-hypothesis testing problem. 


To prove that the vector (26.15) of inner products forms a sufficient statistic we 
need to show that for every 7 € N and for any choice of the epochs t),...,t), € R 
the inner products vector (26.15) forms a sufficient statistic for guessing M based 
on the observation consisting of Y(t),...,¥(t,) and of the inner products vector 
(Definition 26.3.1). By Proposition 22.3.2, it is enough to show this when testing 
between any two fixed distinct messages m’,m” € M. But in this case the suffi- 
ciency of the inner products vector (26.15) follows directly from Section 26.8.3 and 


12This is only a heuristic argument because it only shows that it is optimal to guess H based 
on the vector (26.57). It does not prove that this vector forms a sufficient statistic. 
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Proposition 22.4.2 because, by the general binary hypothesis testing case treated in 
Section 26.8.3, the two inner products (Y, 8’) & (Y,Sm) suffice for this problem, 
and these two inner products are obviously computable from the inner products 
vector (26.15) (simply by ignoring its other components). This completes the proof 
of Theorem 26.4.1. 


26.9 The Front-End Filter 


Receivers in practice rarely have the structure depicted in Figure 26.1 because— 
although mathematically optimal—its hardware implementation is challenging. 
The difficulty is related to the “dynamic range” problem in implementing the 
matched filter: it is very difficult to design a perfectly-linear system to exact specifi- 
cation. Linearity is usually only guaranteed for a certain range of input amplitudes. 
Once the amplitude of the signal exceeds a certain level, the circuit often “clips” 
the input waveform and no longer behaves linearly. Similarly, input signals that are 
too small might be below the sensitivity of the circuit and might therefore produce 
no output, thus violating linearity. This is certainly the case with circuits that 
employ analog-to-digital conversion followed by digital processing, because analog- 
to-digital converters can only represent the input using a fixed number of bits. 
The problem with the structure depicted in Figure 26.1 is that the noise (N(t)) is 
typically much larger than the mean signal, so it becomes very difficult to design a 
circuit to exact specifications that will be linear enough to guarantee that its action 
on the received waveform (consisting of the weak transmitted waveform and the 
strong additive noise) be the sum of the required responses to the mean signal and 
to the noise-signal. (That the noise is typically much larger than the mean signals 
can be seen from the heuristic plot of its PSD; see Figure 25.3. White Gaussian 
noise is often of PSD No/2 over frequency bands that are much larger than the 
band [—W, W] so, by (25.30), the variance of the noise can be extremely large.) 


The engineering solution to the dynamic range problem is to pass the received 
waveform through a “front-end filter” and to then feed this filter’s output to the 
matched filter; see Figure 26.3. Except for a few very stringent requirements, 
the specifications of the front-end filter are relatively lax. The first specification 
is that the filter be linear over a very large range of input levels. This is usu- 
ally accomplished by using only passive elements to design the filter. The second 
requirement is that the front-end filter’s frequency response be of unit-gain over 
the mean signals’ frequency band [-W,W] so that it will not distort the mean 
signals.!° Additionally, we require that the filter be stable and that its frequency 
response decay to zero sharply for frequencies outside the band [—W, W]. This lat- 
ter condition guarantees that the filter’s response to the noise be of small variance 
so that the dynamic range of the signal at the filter’s output be moderate. If we 
denote the front-end filter’s impulse response by hpp, then the key mathematical 
requirements are linearity; stability, i.e., 


/. |hen(t)| dt < 00; (26.59) 


—co 


13Tmprecisions here can often be corrected using signal processing. 
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Figure 26.3: Feeding the signal to a front-end filter and then computing the inner 
products with the mean signals. 
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Figure 26.4: An example of the frequency response of a front-end filter. 


and the unit-gain requirement 
hen(f)=1, [fl <W. (26.60) 


An example of the frequency response of a front-end filter is depicted in Figure 26.4. 


In the rest of this section we shall prove that, as long as these assumptions are met, 
there is no loss in optimality in introducing the front-end filter as in Figure 26.3. 
(In the ideal mathematical world there is, of course, nothing to be gained from this 
filter, because the structure we introduced in Figure 26.1 is optimal.) 

The crux of the proof is in showing that—like (Y (t))—the front-end filter’s output 
is the sum of the transmitted signal and white Gaussian noise of PSD No/2 with 
respect to the bandwidth W. Once this is established, the result follows by recalling 
that the conditional joint distribution of the matched filters’ outputs does not 
depend on the PSD of the noise outside the band [—-W, W] (Note 26.5.1). 


We thus proceed to analyze the front-end filter’s output, which we denote by (Y(t)): 


(Y(t)) = (Y(@)) «hee. (26.61) 
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We first note that (26.60) and the assumption that s,, is an integrable signal that 
is bandlimited to W Hz guarantee that 


Sm * hp =Sm, ME M (26.62) 


ry 


(Proposition 6.5.2 and Proposition 6.4.10 cf. (b)). By (26.62) and by the linearity 

of the filter we can thus express the filter’s output (conditional on M =m) as 
(Y(t)) (Y(t)) * bpp 

Sm * hpg + (N(t)) * hry 

= S45 (N(t)) x hpr. (26.63) 


I 


@ 


We next show that the SP (V(t)) * hpp on the RHS of (26.63) is white Gaussian 
noise of PSD No/2 with respect to the bandwidth W. This follows from Theo- 
rem 25.13.2. Indeed, being the result of passing a measurable stationary Gaussian 
SP through a stable filter, it is a measurable stationary Gaussian SP. And its PSD 
is 


f > Syv(f) lap (f)|’, (26.64) 


which is equal to No/2 for all frequencies f € [-W,W], because for these frequen- 
cies Syn (f) is equal to No/2 and hpp(f) is equal to one. Note that at frequencies 
outside the band [—W, W] the PSD of (N(t)) *hrp may differ from that of (V(t). 


We thus conclude that the front-end filter’s output, like its input, can be expressed 
as the transmitted signal corrupted by white Gaussian noise of PSD No/2 with 
respect to the bandwidth W. Note 26.5.1 now guarantees that for every m © M 
we have that, conditional on M = m, the distribution of 


am Fear. f Veet) a) 


is identical to the conditional distribution of the random vector in (26.15). 


The advantage of the front-end filter becomes apparent when we re-examine the 
PSD of the noise at its output. If the front-end filter’s frequency response decays 
very sharply to zero for frequencies outside the band [—W, W], then, by (26.64), 
this PSD will be nearly zero outside this band. Consequently, the variance of the 
noise at the front-end filter’s output—which is the integral of this PSD—will be 
greatly reduced. This will guarantee that the dynamic range at the filter’s output 
be much smaller than at its input, thus simplifying the implementation of the 
matched filters. 


26.10 Detection in Passband 


The detection problem in passband is very similar to the one in baseband. The 
difference is that the mean signals {s,,} are now assumed to be integrable signals 
that are bandlimited to W Hz around the carrier frequency f, (Definition 7.2.1) 
and that the noise is now assumed to be white Gaussian noise of PSD No/2 with 
respect to the bandwidth W around f, (Definition 25.15.3). 
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Here too, the inner products in (26.16) form a sufficient statistic. So do those in 
(26.18) whenever the signals {s,} satisfy 


Sm € span(S1,...,8,), meEeM 


and are integrable signals that are bandlimited to W Hz around fe. 


For every m € M the conditional distribution of the vector of inner products 
in (26.18), conditional on M = m, is Gaussian with mean vector (26.22) and 
covariance matrix (26.23). The latter covariance matrix can also be written in 
terms of the baseband representation of the mean signals using the relation 


(8), 85”) = 2 Re((8;,BB, 8)”.BB)), (26.65) 


where 8; pp and 8; Bp are the baseband representations of sj; and sj” (Theo- 
rem 7.6.10). 

The computation of the inner products (26.18) can be performed in passband 
by feeding the signal Y directly to filters that are matched to the passband sig- 
nals {8;}, or in baseband by expressing the inner product (Y,8,;) in terms of the 
baseband representation §; pp of 8; as follows: 


(Y,s;) = (Y,t as 2 Re(5;,pB(t) e2r fet) \ 
- 2 | (Y (t) cos(2mf.t)) Re(8;,pn(t)) dt 


—co 


= 5 / . (Y(t) sin(27 f-t)) Im(5;,pp(t)) de. 


This expression suggests computing the inner product (Y,8,) using two baseband 
matched filters: one that is matched to Re(8;pp) and that is fed the product 
of (Y(t)) and cos(27fct), and one that is matched to Im(8,,pp) and that is fed the 
product of (Y(t)) and sin(27-fct).4 


As discussed in Section 26.9, in practice one typically first feeds the received sig- 
nal (Y(t)) to a stable highly-linear bandpass filter of frequency response hpp-rp(-) 
satisfying 


hpp-pe(f)=1,  ||f| — fe| < W/2, (26.66) 


with the frequency response decaying drastically at other frequencies to guarantee 
that the filter’s output be of small dynamic range. 


14Since the baseband representation of an integrable passband signal that is bandlimited to W 
Hz around the carrier frequency fe is integrable (Proposition 7.6.2), it follows that our assumption 
that S; is an integrable function that is bandlimited to W Hz around the carrier frequency fc 
guarantees that both t +> cos(2mfct) Re(5;,pn(t)) and t + sin(2mfet) Im(3;,pR(t)) are inte- 
grable. Hence, with probability one, both the integrals [°° (Y(t) cos(27fet)) Re(S;,pn(t)) dt and 


S22 (¥ (t) sin(27 fet) Im (5; pp (t)) dé exist. 
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26.11 Some Examples 


26.11.1 Binary Hypothesis Testing 


Before treating the general binary hypothesis testing problem we begin with the 
case of antipodal signaling with a uniform prior. In this case 


So = —-S1 =S, (26.67) 


where s is some integrable signal that is bandlimited to W Hz. We denote its 
energy by Eg, ie., 

Es = IIsllg (26.68) 
and assume that it is strictly positive. In this case the dimension of the linear 
subspace spanned by the mean signals is one, and this subspace is spanned by the 
unit-norm signal 

er, (26.69) 
IIsll_o 
Depending on the outcome of a fair coin toss, either s or —s is sent over the channel. 
We observe the SP (Y(t)) given by the sum of the transmitted signal and white 
Gaussian noise of PSD No/2 with respect to the bandwidth W, and we wish to 
guess which signal was sent. How should we form our guess? 


By Theorem 26.4.1 a sufficient statistic for this guessing problem is T = (Y,@). 
Conditional on H = 0, we have T ~ N(VEs, No/2), whereas, conditional on 
HAH =1, we have T ~ N(-vEs, No/2). How to guess H based on T is the problem 
we addressed in Section 20.10. There we showed that it is optimal to guess “H = 0” 
if T > 0 and to guess “H = 1” if T < 0. (The case T = 0 occurs with probability 
zero, so we need not worry about it.) An optimal decision rule for guessing H 
based on (Y(t)) is thus: 


Guess “H = 0” it [ Y(t) s(t) dt > 0. (26.70) 


Let pwap(error|s) denote the conditional probability of error of this decision rule 
given that s was sent; let pyap(error| — s) be similarly defined; and let p* (error) 
denote the optimal probability of error of this problem. By the optimality of our 
rule, 
- 1 
p' (error) = 5 (pmap(error|s) + pmap(error| — s)). 


Using the expression for the error probability derived in Section 20.10 we obtain 


2 IIsll3 
p* (error) = Q |4; —_= |, (26.71) 
No 


which, in view of (26.68), can also be written as 


p’ (error) = Q (Ve (26.72) 
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Note that, as expected from Section 26.5.2 and in particular from Proposition 26.5.2, 
the probability of error is determined by the “geometry” of the problem, which in 
this case is summarized by the energy in s. 


There is also a nice geometric interpretation to (26.72). The distance between the 
mean signals s and —s is ||/s — (—s)||,) = 2VEs. Half the distance is VEs. The 
inner product between the noise and the unit-length vector @ pointing from —s 
to s is N(0,No/2). Half the distance thus corresponds to VEs/./No/2 standard 
deviations of this inner product. The probability of error is thus the probability 
that a standard Gaussian is greater than half the distance between the signals as 
measured by standard deviations of the inner product between the noise and the 
unit-length vector pointing from —s towards s. 


Consider now the more general binary hypothesis testing problem where both hy- 
potheses are still equally likely, but where now the mean signals sp and s; are not 
antipodal, i.e., they do not sum to zero. Our approach to this problem is to reduce 
it to the antipodal case we already treated. We begin by forming the signal (Y(t)) 
by subtracting (so +s1)/2 from the received signal, so 


Y(t) =Y(t) —=(so(t)+s1()), teER. (26.73) 


Since Y(t) can be recovered from Y(t) by adding (so(t) + s1(t)) /2, the smallest 
probability of a guessing error that can be achieved based on (Y(t)) is no larger 
than that which can be achieved based on (Y(t)). (The two are, in fact, the same 


because (Y(t)) can be computed from (Y(t)).) 


The advantage of using (Y(t)) becomes apparent once we compute its conditional 
law given H. Conditional on H = 0, we have Y(t) = (so(t) — si(t))/2 + N(t), 
whereas conditional on H = 1, we have Y(t) = —(so(t)—s1(#))/2+N(t). Thus, the 
guessing problem given (Y(t)) is exactly the problem we addressed in the antipodal 
case with (so — s1)/2 playing the role of s. We thus obtain that an optimal decision 
rule is to guess “H = 0” if f Y(t) (so(t) — s1(t))/2d¢ is nonnegative. Or stated in 
terms of (Y(¢)) using (26.73): 


Guess “H = 0” if iE (rw so(t) . 20) so(t) 5 si(t) de 0: (26.74) 


—co 


The error probability associated with this decision rule is obtained from (26.71) by 
substituting (so — s1)/2 for s: 


llso — sll 


26. 
a (26.75) 


p* (error) = Q 


This expression too has a nice geometric interpretation. The inner product between 
the noise and the unit-norm signal that is pointing from s to Sp is N(0, Ng/2). The 
“distance” between the signals is ||s9 —s;||,. Half the distance is ||so — s1||, /2, 
which corresponds to 
Ilso — Silly /2 
No/2 
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standard deviations of a N(0,No/2) random variable. The probability of error 
(26.75) is thus the probability that the inner product between the noise and the 
unit-norm signal that is pointing from s; to Sg exceeds half the distance between 
the signals. 


26.11.2 8-PSK 


We next present an example of detection in passband. For concreteness we consider 
8-PSK, which stands for “8-ary Phase Shift Keying.” Here the number of hypothe- 
ses is eight, so M = {1,2,...,8} and M = 8. We assume that M is uniformly 
distributed over M. Conditional on M = m, the received signal is given by 


Y(t) =s5m(Q)+N(@), teER, (26.76) 

where 
S8m(t) = 2Re(cnspp(t) ernit), teR; (26.77) 
ie aaens: (26.78) 


for some positive real a; the baseband signal spp is an integrable complex signal 
that is bandlimited to W/2 Hz and of unit energy 


\lSpBllo = 1; (26.79) 


the carrier frequency f, satisfies f, > W/2; and (N(t)) is white Gaussian noise 
of PSD No/2 with respect to the bandwidth W around the carrier frequency f. 
(Definition 25.15.3). Irrespective of M, the transmitted energy E, is given by 


E, = IISmall3 


i” a (2Re(cmspn(t) rte)” at 


—oco 


= 20°, (26.80) 


as can be verified using the relationship between energy in passband and baseband 
(Theorem 7.6.10) and using (26.79). 


The transmitted waveform s,, can also be written in a form that is highly suggestive 
of a choice of an orthonormal basis for span(si,...,Sm): 


8m(t) = V2 Re(em) V2 Re(spp(t) e?"/") +V2Im(em) V2 Re(i spp (t) e?"/") 
(= ee -_ OT 
o1(t) $2(t) 
= V2Re(cm) 6; (t) + V2 Im(em) ¢9(t), 


where 


4 V2Re(spp(t)e?"F"), teER, 
go(t) = V2Re(ispa(t)e?"*"), teER. 


o 

iad 

S 
|> 
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Figure 26.5: Region of points (t™, ¢)) resulting in guessing “M = 1.” 


From Theorem 7.6.10 on inner products in passband and baseband, it follows that 
d; and @2 are orthogonal. Also, from that theorem and from (26.79), it follows 
that they are of unit energy. Thus, the tuple (1, @2) is an orthonormal basis 
for span(s1,...,8m). Consequently, the vector T = ((Y,¢1),(Y,@2))' forms a 
sufficient statistic for guessing M based on (Y(t), and, conditional on M = m, the 
components of T are independent with T) ~ N (V2acos(2mm/8),No/2) and with 
TO?) w N(vV2a sin(27m/8), No/2). We have thus reduced the guessing problem to 
that of guessing M based on a two-dimensional vector T. 


The problem of guessing M based on T was studied in Section 21.4. To lift the 
results from that section, we need to substitute 2a for A and to substitute No/2 
for 0”. For example, the region where we guess “M = 1” is given in Figure 26.5. 


For the scenario we described, some engineers prefer to use complex random vari- 
ables (Chapter 17). Rather than viewing T as a two-dimensional real random 
vector, they prefer to view it as a (scalar) complex random variable whose real 
part is (Y,@1) and whose imaginary part is (Y, @2). Conditional on M = m, this 
CRV has the form 


V2em+Z, Z~ Ne(0,No), (26.81) 


where Nc(0,No) denotes the circularly-symmetric variance-No complex Gaussian 
distribution (Note 24.3.13). 


The expression for the probability of error of our detector can also be lifted from 
Section 21.4. Substituting, as above, 2a for A and No/2 for o?, we obtain 
from (21.25) that the conditional probability of error pyap(error|M = m) of our 
proposed decision rule is given for every m € M by 


1 TO 20? sin? T 
pmap(error|M = m) = - | e Nosm@+v) dQ, w= 3 
T JO 


The conditional probability of error can also be expressed in terms of the trans- 
mitted energy E, using (26.80). Doing that and recalling that the conditional 
probability of error does not depend on the transmitted message, we obtain that 
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the average probability of error p*(error) is given by 


1 Tp _ __Egsin? T 
p* (error) = =| e Nos? O44) dO, w= z (26.82) 
T JO 


Note 26.11.1. The expression (26.82) continues to hold also for M-PSK where 
Cm = ae?™™/M for M > 2 not necessarily equal to 8, provided that we define 
wp =7/M in (26.82). 


26.11.3 Orthogonal Keying 


We next consider M-ary orthogonal keying. We assume that the RV M that we 
wish to guess is uniformly distributed over the set M = {1,...,M}, where M > 2. 
The mean signals are assumed to be orthogonal and of equal (strictly) positive 
energy Es: 

(Sm;S8m’) = Esl{m’ =m"}, m',m" eM. (26.83) 


Since M is uniform, and since the mean signals are of equal energy, it follows 
from Theorem 26.6.3 that to minimize the probability of guessing incorrectly, it is 
optimal to correlate the received waveform (Y (t)) with each of the mean signals 
and to pick the message whose mean signal gives the highest correlation: 


Guess “m” if (Y,Sm) = max (Y,Sm’) (26.84) 


Mm 
m'ieEM 


with ties (which occur with probability zero) being resolved by picking a random 
message among those that attain the highest correlation. 


We next address the probability of error of this optimal decision rule. We first 
define the vector (T,...,T0”)" by 


Bosh Y(t) oka Ce {l,...,M} 


and recast the decision rule as guessing “M = m” if T° = maxmem T™'), with 
ties being resolved at random among the components of T that are maximal. 


Let pmap(error|M = m) denote the conditional probability of error of this decoding 
rule, conditional on M = m. Conditional on M = m, an error occurs in two cases: 
when m does not attain the highest score or when m attains the highest score but 
this score is also attained by some other message and the tie is not resolved in m’s 
favor. Since the probability of a tie is zero (Note 26.6.4), we may ignore the second 
case and only compute the probability that an incorrect message is assigned a score 
that is (strictly) higher than the one associated with m. Thus, 


pmap(error|M = m) 
= Prima PO 0 PO POs EO PO | Man). (26.85) 
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From (26.28) and the orthogonality of the signals (26.83) we have that, condi- 
tional on M = m, the random vector T is Gaussian with the mean of its m-th 
component being \/Es, the mean of its other components being zero, and with all 
the components being of variance No/2 and independent of each other. Thus, the 
conditional probability of error given M = m is “the probability that at least one 
of M — 1 IID N(0, No/2) random variables exceeds the value of a N(/Es, No/2) 
random variable that is independent of them.” Having recast the probability of 
error conditional on M = m ina way that does not involve m (the clause in quotes 
makes no reference to m), we conclude that the conditional probability of error 
given that M =m does not depend on m: 


pmap(error|M = m) = pmap(error|M =1), mem. (26.86) 
This conditional probability of error can be computed starting from (26.85) as 


pmap(error|M = 1) 
= Pr[max{T®,.. ae >TY|M=]] 
=1-Pr[max{T,..., TM} <7 |M=]] 


=1-f frw|m= apical. 200) <t|M=1,T =] dt 
=1-[" fra|m=i(t) Pr[max{T®,..., 7} <+¢|M =1] de 
-1-f froma) Pr[T® <+4,...,7% <t|M=1] dt 


=1-[" Ge ait: (1) (Pr[r <t|M= J) at 
= oe 

=1- [fro M= @)(1- Q( a) dt 

ot bei) 


M-1 
1 oa 2 2E 

= sae ad (aie : 
= ie ( o(r+ No )) dr, (26.87) 


where the first equality follows from (26.85); the second because the conditional 
probability of an event and its complement add to one; the third by conditioning 
on T() = ¢ and integrating it out, i.e., by noting that for any random variable X 
of density fx(-) and for any random variable Y, 


Pay <x/= ie ix@)Pr|Y <a |X =a] de, (26.88) 


with X here being equal to T and with Y here being max{T),...,7)}; the 
fourth from the conditional independence of T and (T@),..., 7) given M = 1, 
which implies the conditional independence of T“) and max{T®),...,7™)} given 
M = 1; the fifth because the maximum of random variables does not exceed t if, 
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and only if, none of them exceeds t 
(max{7, eT t) S (r® cde ROY Bee t); 


the sixth because, conditional on M = 1, the random variables T?),...,7™ are 
IID so 
M-1 

Pr[T® <¢,...,7% <t|M=1)=(Pr[T@ <t|M=1]) ; 
the seventh because, conditional on M = 1, we have T?) ~ N(0,No/2) and using 
(19.12b); the eighth because, conditional on M = 1, we have T) ~ N(VEs, No/2) 
so its conditional density can be written explicitly using (19.6); and the final equal- 
ity using the change of variable 


(26.89) 


Using (26.86) and (26.87) we obtain that if p*(error) denotes the unconditional 
probability of error, then p*(error) = pywap(error|M = 1) and 


oie Sie oi (1 o(rs os )) en (26.90) 


An alternative expression for the probability of error can be derived using the 
Binomial Expansion 


n 


(a+b)"= >> 6 a”—J pi, (n EN, a,be R). (26.91) 


j=0 


Substituting 


2E 
a=1, b=-0(r+ ")), n=M-1, 
in (26.91) yields 


,-o(e+ JE) = Sewn ("7) (ole EY 


j=0 
a1 Fear) (a-ha) 


from which we obtain from (26.90) (using the linearity of integration and the fact 
that the Gaussian density integrates to one) 


j 
—1 cos! | 2 2E 
(error) Rae a ; ) / Pome (o(-4 . )) dr. 
; J —oo V20 No 


j=1 
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For the case where M = 2 the expression (26.90) for the probability of error can 


be simplified to 
/ Es 
p* (error) = Q ( , M=2, (26.93) 
No 


as we proceed to show in two different ways. The first way is to note that for 
M = 2 the probability of error can be expressed, using (26.85) and (26.86), as 


pmap(error|M = 1) = Pr[T? > TY | M = 1 
=Pr[T?) —T >0|M=]] 
/ Es 

Q ( re | ’ 

where the last equality follows because, conditional on M = 1, the random vari- 
ables T“) and T@) are independent Gaussians of variance No/2 with the first 
having mean \/E, and the second having zero mean, so their difference T@) — T 
is V(—VEs, No). (The probability that a M(—VEs, No) RV exceeds zero can be 
computed using (19.12a).) The second way of showing (26.93) it to use (26.75) and 


to note that the orthogonality of s; and sz implies ||s; — so||2 = IIsill3 + IIs2l|3 — 
2E.. 


26.11.4 The M-ary Simplex 


We next describe a detection problem that is intimately related to the problem we 
addressed in Section 26.11.3. To motivate the problem we first note: 


Proposition 26.11.2. Consider the setup described in Section 26.2. If s is any 
integrable signal that is bandlimited to W Hz, then the probability of error associated 
with the mean signals {s1,...,sm} and the prior {7m} is the same as with the mean 
signals {s; — s,...,Sm —s} and the same prior. 


Proof. We have essentially given a proof of this result in Section 14.3 and also 
in Section 26.11.1 in our analysis of nonantipodal signaling. The idea is that, 
by subtracting the signal s from the received waveform, the receiver can make the 
problem with mean signals {s1,...,sm} appear as though it were the problem with 
mean signals {s; —s,...,8m — Ss}. Conversely, by adding s, the receiver can make 
the latter appear as though it were the former. Consequently, the best performance 
achievable in the two settings must be identical. 


The expected transmitted energy when employing the mean signals {s),...,sm} 
may be different than when employing the mean signals {s1 — s,...,sm —s}. In 
subtracting the signal s one can change the average transmitted energy for better 
or worse. As we argued in Section 14.3, to minimize the expected transmitted 
energy, one should choose s to correspond to the “center of gravity” of the mean 
signals: 
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Proposition 26.11.3. Let the prior {mm} and mean signals {Sm} be given. Let 


sc Ss Gee. (26.94) 


meM 


Then, for any energy-limited signal s 


> tim [8m — 8ellg < S > tm llSm—sllps (26.95) 


meM meM 


with equality if, and only if, s is indistinguishable from s,. 


Proof. Writing s,, —s as (S,, — Ss.) + (Ss. — Ss) we have 


2 
S> Tm ||Sm — Sl|g 


meM 
= So tm |I(Sm — 8+) + (Se — 8) II 
meM 
= Ss Tm ||Sm — Sx ee S- Tm ||Sx — s||> +2 Ss Tm (Sm — Sx, Sx — 8) 
meM meM meM 
= > Tm ||Sm — Sx 4 + Ils» — sllZ +2( S- rin(Bm ~5.):5. ~8 
meM meM 
= S- Tm ||Sm — Sx 4, + |ls« —sllZ +2 (Se — Ss, 8% — 8) 
meM 
2 2 
= S- Tm ||Sm — Sx|| + ||[Sx — S|ls 
meM 
2 
2 S- Tm ||Sm — Sx Q) 
meM 


with the inequality being an equality if, and only if, ||s..— s||2 =0. 


We can now construct the simplex signals as follows. We start with M orthonormal 
waveforms @1,...,0mM 


(@m'; Pm) = I{m’ = ms, m’, m"! EM (26.96) 


that are integrable and bandlimited to W Hz. We set @ to be their “center of 
gravity” with respect to the uniform prior 


z 1 
o=5 S> bm. (26.97) 
meM 
Using (26.96), (26.97), and the basic properties of the inner product (3.6)—(3.10) 
it is easily verified that 


(rm — 6, bm" — 6) = If m' = mi" - m',m" —M. (26.98) 


26.11 Some Examples 595 


Figure 26.6: Starting with two orthonormal signals and subtracting the “center of 
gravity” from each we obtain two antipodal signals. Scaling these antipodal signals 
results in the simplex constellation with two signals. 


we 


Figure 26.7: Constructing the simplex constellation with three points from three 
orthonormal signals. Left figure depicts the orthonormal constellation and its cen- 
ter of gravity; middle figure depicts the result of subtracting the center of gravity, 
and the right figure depicts the result of scaling (from a different perspective). 


We now define the M-ary simplex constellation with energy Es by 


sm = VE 7 (bm — 8): mem. (26.99) 


The construction for the case where M = 2 is depicted in Figure 26.6. It yields 
the binary antipodal signaling scheme. The construction for M = 3 is depicted in 
Figure 26.7. 


From (26.99) and (26.98) we obtain for distinct m’,m"” € M 


Es 
M-1- 


IISmall3 = Es and (Sm’, Sm’) — > (26.100) 


Also, from (26.99) we see that {s,,} can be viewed as the result of subtracting the 
center of gravity from orthogonal signals of energy E, M/(M — 1). Consequently, 
the least error probability that can be achieved in detecting simplex signals of 
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Figure 26.8: Adding a properly scaled signal # that is orthogonal to all the 
elements of a simplex constellation results in an orthogonal constellation. 


energy E, is the same as the least error probability that can be achieved in detecting 


orthogonal signals of energy 
M 
aq a Ee 26.101 
at ( ) 
(Proposition 26.11.2). From the expression for the error probability in orthogonal 
signaling (26.90) we obtain for the simplex signals with a uniform prior 


M-1 
1 2 2 | M_ 2E 
RY a eee 7/2 _ weenie Ss 
p* (error) = 1 On fe (: o(r+ M-—1 N, )) dr. 


(26.102) 


The decision rule for the simplex constellation can also be derived by exploiting the 
relationship to orthogonal keying. For example, if is a unit-energy integrable sig- 
nal that is bandlimited to W Hz and that is orthogonal to the signals {s1,...,sm}, 
then, by (26.100), the waveforms 


{5+ mae} (26.103) 


are orthogonal, each of energy E; M/(M — 1). (See Figure 26.8 for a demonstra- 
tion of the process of obtaining an orthogonal constellation with M = 2 signals 
by adding a signal yw to each of the signals in a binary simplex constellation.) 
Consequently, in order to decode the simplex signals contaminated by white Gaus- 
sian noise with respect to the bandwidth W, we can add Mai VEs w to the re- 
ceived waveform and then feed the result to an optimal detector for orthogonal 
keying. 


26.11.5 Bi-Orthogonal Keying 


Starting with an orthogonal constellation, we can double the number of signals 
without reducing the minimum distance. This construction, which results in the 
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Figure 26.9: A bi-orthogonal constellation with six signals. 


“bi-orthogonal signal set” is the topic of this section. To construct the bi-orthogonal 


signal set with 2« signals, we start with « > 1 orthonormal signals (@1,..., 0x) 
and define the 2« bi-orthogonal signal set {Si.u,Si,4,---;Sx,u;S«,a} by 
Syu=tVEs@ and sya=—-VEsd@y, v€ {l,..., 4}. (26.104) 


We can think of “u” as standing for “up” and of “d” as standing for “down,” so to 
each signal @, there correspond two signals in the bi-orthogonal constellation: the 
“up signal” that corresponds to multiplying VE,@, by +1 and the “down signal” 
that corresponds to multiplying VE,@, by —1. Only bi-orthogonal signal sets with 
an even number of signals are defined. The constructed signals are all of energy Es: 


ecals=(salg=Vin Vet aae). (26.105) 


A bi-orthogonal constellation with six points (k = 3) is depicted in Figure 26.9. 
Suppose that each of the signals ¢1,...,@, is an integrable signal that is band- 
limited to W Hz and that, consequently, so are all the signals in the constructed 
bi-orthogonal signal set. A signal is picked uniformly at random from the signal set 
and is sent over a channel. We observe the stochastic process (Y (t)) given by the 
sum of the transmitted signal and white Gaussian noise of PSD No/2 with respect 
to the bandwidth W. How should we guess which signal was sent? 


Since the signal was chosen equiprobably, and since all the signals in the signal set 
are of the same energy, it is optimal to consider the inner products 


€YS Siu) ) ee $1.4) eats) (Y, Su) ’ (Y, S.,d) (26.106) 
and to pick the signal in the signal set corresponding to the largest of these inner 
products. By (26.104) we have for every v € {1,...,«} that s,. = —Sp,a so 
(Y,Sv.u) = — (Y,8p,a) and hence 


max{(Y,S,,u) ; (Y,s,,a)} = |(Y,sv,u) 


, Ve{l,..., 4K}. (26.107) 
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Equivalently, by (26.104), 


mad (Yeas (¥ vay) = VEs|(Y, gv) 


To find the maximum of the 2« terms in (26.106) we can first compute for each 
vy €{1,...,«} the maximum between (Y,s,,) and (Y,s,,a) and then compute the 
maximum of the « results: 


, ve{i,...,«}. 


max{ (Y, Stu), (Y,S81,a),---;(Y¥,S8x,u) (Y,8,,a)} 
= max{ max{(Y,81,1) .(Y,81,4)},...,max{(¥,8..u), (Y, sx,a)} }. 


Using this approach, we obtain from (26.107) the following optimal two-step proce- 
dure: first find which v* in {1,...,«} attains the maximum of the absolute values 
of the inner products 


and then, after you have found v*, guess “s,«..” if (Y,@,*) > 0 and guess “s,« 4” 
if (Y,o,+) <0. 


We next compute the probability of error of this optimal guessing rule. It is not 
difficult to see that the conditional probability of error does not depend on the 
message we condition on. For concreteness, we shall analyze the probability of 
error associated with the message corresponding to the signal s;,y, a probability 
that we denote by pwap(error|s;,,), with the corresponding conditional probability 
of correct decoding pyap(correct|si,.) = 1 — pmap(error|si,.). To simplify the 
typesetting, we shall denote the conditional probability of the event A given that 
Siu is sent by Pr(A]s1,u). 


Since the probability of ties in the likelihood function is zero (Note 26.6.4) 


pmap(correct|sj 4) 
= Pr(—(¥, 1) < (¥,@1) and max {|(¥, ,)|} < (¥, 1) | sin) 


= Pr((¥,g1) > 0.and max {|(¥,@y)|} < (¥, 61) |81.) 
re | Fov.d1) 1510 (4) Pr{ max {|(¥,6.)|} <t| sin, (¥, or) =4] dt 
= | fever) |s1.0 (4) Pr| max {|(¥, gv)| } st | S1.1| dt 


=f fovebois..(t) (PHY, 42)] <t]sia])” at 
0 


oan b _ (t=vEs)? t = 
=} e No 1-20 dt 
0 No No/2 
K-1 
1 me 2 2E 
eee area iba 20( bal : | d 26.108 
e€ To T; ‘ 
ak pa ( No ( 


with the following justification. The first equality follows from the definition of 
our optimal decoder and from the fact that ties occur with probability zero. The 
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second equality follows by trivial algebra (—& < € if, and only if, € > 0). The third 
equality follows by conditioning on (Y,S1,1) being equal to ¢ and integrating t 
out while noting that a correct decision can only be made if t > 0, in which case 
the condition (Y,@,) > 0 is satisfied automatically. The fourth equality follows 
because, conditional on the signal s;,, being sent, the random variable (Y,s1,) is 
independent of the random variables {|(Y, @,)|}o<v<x. The fifth equality follows 
because, conditional on s;,, being sent, the random variables {|(Y, @.)|}o<v<« are 
IID. The sixth equality follows because, conditional on s;,, being sent, we have 


(¥, 1) ~ N (Es, No/2) and (Y, $2) ~ N'(0, No/2), so 
Pr| |(¥,2)| < ¢|s1,0] 
= pr| eel Ze sin 
Y, bo t 
= P| aan = IND) 
on SB 


a 
=1-10(—lng) 


Finally, (26.108) follows from the substitution t = (t — V/Es)/./No/2 as in (26.89). 


Since the conditional probability of error does not depend on the message, it follows 
that all conditional probabilities of error are equal to the average probability of 
error p*(error) and 


2E; 


K—-1 
p* (error) = 1— ee eo /2 (: -20(r+ No )) dr, | (26.109) 


or, using the Binomial Expansion (26.91) with the substitution of -9(r +4/ x ) 
for b and of 1 for a, 


(k—-1 1 99. 2 2E, 
(error) 1)9+193 @ : \=/ et? o(r+ | dr. 
Sen Jj V2r J_ [2Es No 
(26.110) 


The probability of error associated with an orthogonal constellation with « signals 
is better than that of the bi-orthogonal constellation with 2K signals and equal 
average energy. But the comparison is not quite fair because the bi-orthogonal 
constellation is richer. 


26.12 Detection in Colored Noise 


Our focus throughout has been on the detection problem when the noise is “white” 
in the sense that its PSD is flat over the frequency band to which the mean sig- 
nals are limited. We now extend the discussion to “colored” noise, i.e., to noise 
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whose PSD is not constant over the bandwidth of interest. We continue to as- 
sume that the mean signals {Sm}mema are integrable signals that are bandlimited 
to W Hz and that the noise (N(t)) is independent of the message M and is a 
measurable, stationary, Gaussian SP. Its PSD Syy, however, is now an arbitrary 
nonnegative, symmetric, integrable function that is not necessarily constant over 
the band [—W, W]. Conditional on M = m, the received waveform (Y(t)) is given 
at time t by S(t) + N(#). 

Our approach is based on “whitening the noise” and is only applicable when the 


noise can be whitened with respect to the bandwidth W, i.e., when there exists a 
whitening filter for the noise with respect to W: 


Definition 26.12.1 (Whitening Filter for Sy with respect to W). A filter of im- 
pulse response h: R — R is said to be a whitening filter for Syn (or for (N(t))) 
with respect to the bandwidth W if it is stable and its frequency response h 
satisfies 


Suv (FIA? =1, [fl < W. (26.111) 


Only the magnitude of the frequency response of the whitening filter is specified in 
(26.111) and only for frequencies in the band [—-W, W]. The response is unspecified 
outside this band. Consequently: 


Note 26.12.2. There may be many different whitening filters for Svw with respect 
to the bandwidth W. 


If Svw is zero at some frequencies in [—-W,W], then there is no whitening filter 
for Syn with respect to W. Likewise, a whitening filter for Syy does not exist 
if Svw is not continuous in [—-W, W] (because the frequency response of a stable 
filter must be continuous (Theorem 6.2.11), and if Sy is discontinuous, then so is 


f > 1// Sv (f)|)- Thus: 


Note 26.12.3. There does not always exist a whitening filter for Syj with respect 
to W. 


We shall see, however, in Proposition 26.12.8 that a whitening filter exists whenever 
throughout the interval [-W,W] the PSD Sww is strictly positive and is twice 
continuously differentiable. 


The filter is called “whitening” because, by Theorem 25.13.2, we have: 


Proposition 26.12.4. /f (N(t), te R) is a measurable, stationary, Gaussian SP 
of PSD Snn, and if h is the impulse response of a whitening filter for Syn with 
respect to W, then (N(t)) xh is white Gaussian noise of PSD 1 with respect to the 
bandwidth W. 


Assuming that the noise can be whitened with respect to the bandwidth W, we 
pick some whitening filter of impulse response h and denote by (Y (t)) the result 
of feeding the observed SP (Y(t)) to this filter: 


(Y(t)) = (Y() «h. (26.112) 
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Conditional on M = m, the output of the whitening filter is given by 


Y(t) = (Sm + (N(t))) xh 


+N(t), teER, (26.113) 
where 
Sm =Smxh, mem, (26.114) 
and 7 
(N(t)) = (N(d) *h. (26.115) 
By Proposition 6.5.2, S,, is an integrable signal that is bandlimited to W Hz and 
Ww A . 
3m(t) = / am(f) h(f)e®"Ft df, teER. (26.116) 
—W 


And, by Proposition 26.12.4, (N(t)) is white Gaussian noise of PSD 1 with respect 
to the bandwidth W. 


Loosely speaking, the main result of this section is that there is no loss in optimal- 
ity in guessing M based on the whitening filter’s output (Y(t). This is not very 
surprising for the following reason. While passing (Y(t)) through the whitening 
filter is not necessarily an invertible operation, it “almost” is, in the sense that we 
can recover the original observation inside the band [—W, W]. Since the transmit- 
ted signals are bandlimited to W Hz, we do not expect that the observation outside 
this band will influence our guess. 


Once this result is proved, the detection problem is reduced to detecting known 
signals (the signals {S,,}) in white Gaussian noise (the SP (N (t))). Employing 
Theorem 26.4.1, we obtain that if guessing M based on the whitening filter’s output 
is optimal, then so is basing one’s guess on the inner products vector 


((¥.a1),...%am)) (26.117) 


thus reducing the continuous-time detection problem to one where the observation 
is a random vector taking value in RM. 


We next describe the sufficient statistic for our problem more carefully. Rather than 
expressing the sufficient statistic as in (26.117), we prefer to express it directly in 
terms of the observed signal (Y(t)) as the vector 


((v,hsi),...,(¥Rxam)) (26.118) 


where the equivalence of the two forms can be formally derived as follows: 


Co 


7 Y(c) h(t—o) ac) 5m(t) dt 


( 
ie Y(c) A(t — 0) S(t) dé do 
is ie Y(o) ta h(t —0) sn(t) at) dg 


602 Detection in White Gaussian Noise 


Note that for each m € M the convolution h «8, is the result of passing the 
signal S,,, which is an integrable signal that is bandlimited to W Hz, through 
the stable filter of impulse response h, so hx Sm is an integrable signal that is 
bandlimited to W Hz (Proposition 6.5.2). This integrability guarantees that the 
inner products in (26.118) are well-defined (Proposition 25.10.1). 


We can now state the main result of this section: 

Theorem 26.12.5 (Detecting Known Signals in Colored Noise). Let M take value 
in the finite set M = {1,...,M}, and let the signals s1,...,8m_ be integrable signals 
that are bandlimited to W Hz. Let the conditional law of (Y(t) given M =m be 
that of S(t) + N(t), where (N(t)) is a stationary, measurable, Gaussian SP of 
PSD Syn that can be whitened with respect to the bandwidth W. Let h be the 
impulse response of a whitening filter for (N(t)). Then: 


(i) The inner-products vector (26.118) forms a sufficient statistic for guessing M 
based on the observation (Y(t)). 


(tt) Conditional on M =m, this vector is Gaussian with mean 
(Gases oueasem)” (26.119) 


and Mx M covariance matrix 


(81,81) (81,82) ++: (81,5m) 
(S2, $1) (S2, $2) aa (S2, 8m) 
; ; (26.120) 
(8m;81) (8m,82) -:- (8m,5m) 
where 
s;=sjxh, jeM, (26.121) 


and where the inner product (Sm1,8m”) can also be expressed as 


we ~ me A Ack 1 / MN 
limb) = fh SCF) Sian) eo a mim" eM. — (26.122) 


(itt) If (b1,..., ar) ts an orthonormal d'-tuple of integrable signals that are band- 
limited to W Hz, and if 


Sm € span(gi,...,@a), meEM, (26.123) 


then the inner products vector 


((¥.fix ds)... (Y Rx ew)) (26.124) 
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forms a sufficient statistic for guessing M based on (Y(t) and, conditional 
on M=™M, ts a multivariate Gaussian of covariance matrix la and of mean 
vector 


((Sms 1) «++, Sms Par)) (26.125) 


Proof. The sufficiency of the vector (26.118) can be established by first proving 
the result in the binary antipodal case, and by then generalizing the result as we 
did in the proof of Theorem 26.4.1. 


In the binary antipodal case we denote the RV to be guessed by H and assume 
that it takes value in {0,1}. We assume that, conditional on H = 0, the time-t 
received waveform is s(t) + N(t) whereas, conditional on H = 1, it is —s(t)+ N(t). 
We show that for every 7 € N and any choice of the epochs ¢;,...,t,, € R, the inner 
product (Y, h *S) forms a sufficient statistic for guessing H based on the vector 


(¥(a), 5 ¥ (tm) (Y, hixs))'. 


As in the proof of Theorem 26.4.1, this can be established using Lemma 26.8.1 as 
follows. One first notes that, conditional on H, this vector is Gaussian (Proposi- 
tion 25.11.1). One then notes that the conditional covariance matrix of this vector 
conditional on H = 0 is the same as conditional on H = 1 and that this covariance 
matrix can be computed using Theorem 25.12.2. Finally one shows that the vec- 
tor’s conditional mean vector, conditional on H = 0, is antipodal to its conditional 
mean vector, conditional on H = 1, and that both are scaled versions of the last 
column of the conditional covariance matrix. 


Once the sufficiency of the vector (26.118) has been established, the computation 
of its conditional law is straightforward: by Proposition 25.11.1 it is conditionally 
Gaussian, and its conditional mean (26.119) and conditional covariance (26.120) 
are readily derived using Theorem 25.12.2. The derivation of (26.122) follows from 
(26.116) using the Mini Parseval Theorem (Proposition 6.2.6 (i)) and (26.111). 


An alternative way of deriving the conditional distribution is to note that the vector 
(26.118) can also be expressed as the vector (26.117) and to then use the result 
from Section 26.5.1 by substituting 1 for No/2 and §,,, for s,, for all me M. 


Part (iii) follows directly from Parts (i) and (ii). 


Since the inner products (8m, 8m) for m’,m"’ € M determine the conditional law 
of the sufficient statistic (see (26.119) & (26.120)), and since, by (26.122), the inner 
product (Sm,8m) does not depend on the choice of the whitening filter we obtain: 


Note 26.12.6. Neither the conditional distribution of the sufficient statistic vec- 
tor in (26.118) nor the optimal proability of error depends on the choice of the 
whitening filter. 


Using Theorem 26.12.5 we can now derive an optimal rule for guessing M. Indeed, 
in analogy to Theorem 26.6.3 we have: 


Theorem 26.12.7. Consider the setting of Theorem 26.12.5 with M of prior {1m}. 
The decision rule that guesses uniformly at random from among all the messages 
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méEM for which 


d 


Ina — 5 ((¥. f+ be) a (smh de) 
f=1 
ee * 2 
= nay, {in — 3 oul (Y,h« de) — (8m',h« $e)) \ (26.126) 


minimizes the probability of error whenever h is a whitening filter and the tuple 
(d1,.--, a) forms an orthonormal basis for span(s; *h,...,sm *h). 


Before concluding our discussion of detection in the presence of colored noise we 
derive here a sufficient condition for the existence of a whitening filter. 


Proposition 26.12.8 (Existence of a Whitening Filter). Let W > 0 be fixed. If 
throughout the interval [-W,W] the PSD Syn is strictly positive and twice con- 
tinuously differentiable, then there exists a whitening filter for Snn with respect to 
the bandwidth W. 


Proof. The proof hinges on the following basic result from harmonic analysis 
(Katznelson, 1976, Chapter VI, Section 1, Exercise 7): if a function f +> g(f) 
is twice continuously differentiable and is zero outside some interval [—A, AJ, then 
it is the FT of some integrable function. 


To prove the proposition using this result we begin by picking some A > W. We 
now define a function g: R — R as follows. For f > A, we define g(f) = 0. 
For f in the interval [0,W, we define g(f) = 1/\/Snw(f). And for f € (W,A), 
we define g(f) so that g be twice continuously differentiable in [0,0o). We can 
thus think of g in [W, A] as an interpolation function whose values and first two 
derivatives are specified at the endpoints of the interval. Finally, for f < 0, we 
define g(f) as g(—f). Figure 26.10 depicts Syn, g, W, and A. 


A whitening filter for Syj~ with respect to the bandwidth W is the integrable 
function whose FT is g and whose existence is guaranteed by the quoted result. 


26.13 Detecting Signals of Infinite Bandwidth 


So far we have only dealt with the detection problem when the mean signals are 
bandlimited. What if the mean signals are not bandlimited? The difficulty in this 
case is that we cannot assume that the noise PSD is constant over the bandwidth 
occupied by the mean signals, or that the noise can be whitened with respect to 
this bandwidth. 


We can address this issue in three different ways. In the first we can try to find the 
optimal detector by studying this more complicated hypothesis testing problem. It 
will no longer be the case that the inner products vector (26.15) forms a sufficient 
statistic. It will turn out that the optimal detector greatly depends on the rela- 
tionship between the rate of decay of the PSD of the noise as the frequency tends 
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Figure 26.10: The frequency response of a whitening filter for the PSD Syn with 
respect to the bandwidth W. 


to too and the rate of decay of the FT of the mean signals. This approach will 
often lead to bad designs, because the structure of the receiver will depend greatly 
on how we model the noise, and inaccuracies in our modeling of the noise PSD at 
ultra-high frequencies might lead us completely astray in our design. 


A more level-headed approach that is valid if the noise PSD is “essentially flat 
over the bandwidth of interest” is to ignore the fact that the mean signals are not 
bandlimited and to base our decision on the inner products vector, even if this is 
not fully justified mathematically. This approach leads to robust designs that are 
insensitive to inaccuracies in our modeling of the noise process. If the PSD is not 
essentially flat, we can whiten it with respect to a sufficiently large band [—-W, W 
that contains most of the energy of the mean signals. 


The third approach is to use very complicated mathematical machinery involving 
the It6 Calculus (Karatzas and Shreve, 1991) to model the noise in a way that will 
result in the inner products forming a sufficient statistic. We have chosen not to 
pursue this approach because it requires modeling the noise as a process of infinite 
power, which is physically unappealing. This approach just shifts the burden of 
proof from one place to another. Indeed, the It6 Calculus can now prove for us 
that the inner products vector is sufficient, but we need a leap of faith in modeling 
the noise as a process of infinite power. 


In the future, in dealing with mean signals that are not bandlimited, we shall refer 
to the “white noise paradigm” as the paradigm under which the receiver forms its 
decision based on the inner products vector (26.15) and under which these inner 
products have the conditional law derived in Section 26.5.1. 
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26.14 A Proof of Lemma 26.8.1 


We next present a proof of Lemma 26.8.1 that is also valid when the matrix A is 
singular. We denote the Row-j7 Column-k component of this matrix by A@*). 


Proof. We first treat the case where the variance \'”) of the last component Y‘”) 
of the random n-vector Y is zero. By the Covariance Inequality it follows that for 
every j € {1,...,n} 


JAG”) = |Cov[y@, YM] < /Var[¥@] /Var[¥™] = JrGAVNan), 


so in this case the n-th column of A is zero. Consequently, since the mean vector ps 
is by assumption proportional to the last column of A, it follows that in this case 
pu =0. But for w = 0 the conditional law of Y given H = 0 is the same as given 
H =1, so Y is useless for guessing H, and any measurable function of Y, and a 
fortiori its last component, forms a sufficient statistic (albeit also useless). 


We next turn to the more interesting case where 
MEM s 0: (26.127) 


In this case we can write the assumption that pw is a scaled version of the last 
column of A as 
NG) 
pA = ee ay FE f{l,...sn}. (26.128) 


We need to show that, irrespective of the prior on H, (26.128) implies that 
H-o-Y\")_.—Y (26.129) 
forms a Markov chain or, equivalently, that 
H-o~Y ) _o_-R (26.130) 
forms a Markov chain, where R is the random (n — 1)-vector of components 


NG) 


Noun) 


RO ay — IO pe once dy (26.131) 


(Conditional on Y, we have that Y is deterministic, so R® and Y) only 
differ by a deterministic constant.) Thus, we need to show that R is irrelevant for 
guessing H based on Y("). This we prove using Proposition 22.5.5 by showing that 
Y() and R are conditionally independent given H 


Y\") co H-o-R (26.132) 


and that H and R are independent. 
We begin by proving (26.132). We first note that, conditional on H = 0, the vector 


(RY, ay Hee: yi)" 
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is Gaussian, because it is the result of linearly transforming the vector Y, which, 
conditional on H = 0, is Gaussian (Proposition 23.6.3). Also, conditional on 
H =0, we have that Y) is uncorrelated with the components of R. because 


Cov[¥™, i) re 0| 


= Cov] Y™) yy — ae y™| H=0 
youn) 
7) 
= Com cole eer ee (n) y(n)| a 
= Cov|Y Y | #=9) ame |Y Y H=0| 
= \Gnr) = ae. (n,n) 
youn) 


=0, jeé{l,...,n—-1}. 


By Corollary 23.6.9, we conclude that, conditional on H = 0, we have that Y is 
independent of R. Repeating this argument for the case where the conditioning is 
on H = 1 proves (26.132). 


We next verify that R and H are independent. We do so by showing that the 
conditional distribution of R given H = 0 is identical to its conditional distribution 
given H = 1. Since under both conditionings R is Gaussian, it suffices to show 
that the conditional covariance of R given H = 0 is the same as given H = 1 and 
similarly for the mean. To show that the covariances are the same is easy, because 
the conditional covariance of R is determined by A, which is the same under the 
two hypotheses. As to the mean we have 


. (in) 
E[R® | H= 0| = Ele = a” H= | 
G7) 
—iyo@lgeo)_ 42° ely @ | pe 
= Ely | ce 0| Vaan Ely F2 0| 
Gr) 
= A) A tn) 
a Youn) 4 


=0, je{l,...,n—-1}, 


where the last equality follows from (26.128). Similarly, under H = 1 we have 


E[R® | H= 1 = El ee) re ye ne 1 
= Ely) | = 1 7 aelyo is 1 
7) 


meee) (n) 
NSB ean ad) 


=0, jeflyign—, 


thus establishing that the mean of R does not depend on H either. 


Having established that R and H are independent, it now follows from the con- 
ditional independence of Y() and R given H (26.132) that R is irrelevant for 
guessing H based on Y(”) (Proposition 22.5.5). 
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26.15 Exercises 


Exercise 26.1 (Reducing the Number of Matched Filters). We saw in Section 26.4 how to 
obtain a d-dimensional sufficient statistics vector, where d is the dimension of the linear 
subspace spanned by the mean signals (26.17). Show that, given any integrable signal so 
that is bandlimited to W Hz, we can find a d’-dimensional sufficient statistics vector, 
where 

d' = Dim(span(si —So,...,8SM — So)). 


Show that d’ is sometimes smaller than d. 


Exercise 26.2 (Nearest-Neighbor Decoding Revisited). The form of the decoder in The- 
orem 26.6.3 (ii) is different from the nearest-neighbor rule of Proposition 21.6.1 (ii). 
Why does minimizing ||Y — sm||, not make mathematical sense in the setting of The- 
orem 26.6.3? 


Exercise 26.3 (Proving Sufficiency). In Section 26.8.3 we sketched an argument for the 
sufficiency of the vector in (26.57). Fill in the details. 


Exercise 26.4 (Minimum Shift Keying). Let the signals so,s1 be given at every t € R by 


ss _ aur loe ony was _ cos(2m fit) {0 < t < To}. 


(i) Compute the energies IIso||3, IIs1|[3. You may assume that fi Ts; >> 1 and foTs > 1. 
(ii) Under what conditions on fo, fi, and Ts are so and s; orthogonal? 


(iii) Assume that the parameters are chosen as in Part (ii). Let H take on the values 0 
and 1 equiprobably, and assume that, conditional on H = v, the time-t received 
waveform is s,(t) + N(t) where (N(t)) is white Gaussian noise of double-sided 
PSD No/2 with respect to the bandwidth of interest, and vy € {0,1}. Find an 
optimal rule for guessing H based on the received waveform. 


(iv) Compute the optimal probability of error. 


Exercise 26.5 (Signaling in White Gaussian Noise). Let the RV M take value in the set 
M = {1,2,3,4} uniformly. Conditional on M = m, the observed waveform (Y(t)) is 
given at every time t € R by sm,(t) + N(t), where the signals s1,s2,83,s4 are given by 
si(t) =AI{O0<t< T}, so(t) =AT{O<t < T/2} -Al{T/2 <t< T}, 
s3(t) =2AH0<t< T/2}, sa(t) =—-Al{O<t< T/2}+AI{T/2 <t< Th, 


and where (N(t)) is white Gaussian noise of PSD No/2 over the bandwidth of interest. 
(Ignore the fact that the signals are not bandlimited.) 


(i) Derive the MAP rule for guessing M based on (Y(t)). 
(ii) Use the Union-of-Events Bound to upper bound pmap(error|M = 3). Are all the 


terms in the bound needed? 
(iii) Compute pmap(error|M = 3) exactly. 


(iv) Show that by subtracting a waveform s. from each of the signals s1,$2,$3,S4, we 
can reduce the average transmitted energy without degrading performance. What 
waveform s. should be subtracted to minimize the transmitted energy? 
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Exercise 26.6 (QPSK). Let the IID random bits D; and Dz be mapped to the symbols 
X 1, X2 according to the rule 


(0, 0) re (1,0), (0, 1) ee (-1,0), (1, 0) re (0, 1), qd, 1) ie (0, —1). 
The received waveform (Y(t)) is given by 
Y(t) =AX1 $,(t) +AX2¢,(t)+ N(t), teER, 


where A > 0, the signals $1, @2 are orthonormal integrable signals that are bandlimited 
to W Hz, and the SP (N(t)) is independent of (Di, D2) and is white Gaussian noise of 
PSD No/2 with respect to the bandwidth W. 


(i) Find an optimal rule for guessing (D;, D2) based on (Y(t)). 
(ii) Find an optimal rule for guessing D, based on (Y(t). 
(iii) Compare the rule that you have found in Part (ii) with the rule that guesses that D1 


is the first component of the tuple produced by the decoder that you have found 
in Part (i). Evaluate the probability of error for both rules. 


(iv) Repeat when (D1, D2) are mapped to (X1, X2) according to the rule 


(0,0) (1,0), (0,1) (0,1), (1,0) (-1,0), (1,1) (0,-1). 


Exercise 26.7 (Mismatched Decoding of Antipodal Signaling). Let the received wave- 
form (Y(t)) be given at every t € R by (1—2H) s(t)+.N(t), where s is an integrable signal 
that is bandlimited to W Hz, (N(t)) is white Gaussian noise of PSD No/2 with respect 
to the bandwidth W, and H takes on the values 0 and 1 equiprobably and independently 
of (N(t)). Let s’ be an integrable signal that is bandlimited to W Hz. A suboptimal 
detector feeds the received waveform to a matched filter for s’ and guesses according to 
the filter’s time-O output: if it is positive, it guesses “H = 0,” and if it is negative, it 
guesses “H = 1.” Express this detector’s probability of error in terms of s, s’, and No. 


Exercise 26.8 (Imperfect Automatic Gain Control). Let the received signal (Y(t)) be 
given by 
Y(t) =AX s(t)+ N(t), tER, 


where A > 0 is some deterministic positive constant, X is a RV that takes value in 
the set {—3,—1,+1,+3} uniformly, s is an integrable signal that is bandlimited to W 
Hz, and (N(t)) is white Gaussian noise of double-sided PSD No/2 with respect to the 
bandwidth W. 

(i) Find an optimal rule for guessing X based on (Y(t)). 

(ii) Using the Q-function compute the optimal probability of error. 


(iii) Suppose you use the rule you have found in Part (i), but the received signal is 
3 
Y(t) = qx s(t) + N(t), teR. 


(You were misinformed about the amplitude of the signal.) What is the probability 
of error now? 
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Exercise 26.9 (Positive Semidefinite Matrices). 
(i) Let s1,...,8m_ be of finite energy. Show that the M x M matrix whose Row-j 
Column-é entry is (s;,S¢) is positive semidefinite. 


(ii) Show that any M x M positive semidefinite matrix can be expressed in this form 
with a proper choice of the signals si,...,Ssm. 


Exercise 26.10 (A Lower Bound on the Minimum Distance). Let si,...,sm be equi- 
energy signals of energy Es. Let 


8 aay SD lew sel 


2 
m! m'xzm! 


denote the average squared-distance between the signals. 
(i) Justify the following bound on d: 


1 M M 
72 2 
© = Mot 2 ds lew — Sw lle 


m/=1m"=1 


SNA de 
M-— 1 M—1 M2 s S- (Sm/, Sm’) 


m/i=1m"=1 


M 2 
2M 
z fe 2M | Ly 
M-1° M-1 M du 


m=1 


2 
2M 

< Eg: 

~M-1 


(ii) Show that if, in addition, (Sm/,S8m) = pEs for all m’ Am” in {1,...,M}, then 


1 
M-—1 


<p. 
(iii) Are equalities possible in the above bounds? 


Exercise 26.11 (Generalizations of the Simplex). Let p* (error; Es; 0; M; No) denote the 
optimal probability of error for the setup of Section 26.2 for the case where the prior 
on M is uniform and where 


Es if ae = ues 
(Smn’y Sm”) = Te De tai ae ER: ot SI 
pEs otherwise, 


Show that 


p’ (error; Es; p; M; No) = p* (error; Es(1 — p);0;M;No), — 
Hint: You may need a different proof depending on the sign of p. 


Exercise 26.12 (Decoding the Simplex without Gain Control). Let the simplex constel- 
lation si,...,Sm be constructed from the orthonormal signals @1,...,@m as in Sec- 
tion 26.11.4. In that section we proposed to decode by adding 
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to the received signal Y and then feeding the result to a decoder that was designed for 
the orthogonal signals 


Here w is any signal that is orthogonal to the signals {s1,...,sm}. Show that feeding the 
signal Y +a to the above orthogonal-keying decoder also results in an optimal decoding 
rule, irrespective of the value of a € R. 


Exercise 26.13 (Pretending the Noise Is White). Let H take on the values 0 and 1 
equiprobably, and let the received waveform (Y(t) be given at time t by 


Y(t) = (1— 2H) s(t) + N(O), 


where s: t + I{0 < t < 1}, and where the SP (N(t)) is independent of H and is a 
measureable, centered, stationary, Gaussian SP of autocovariance function 


Kyn(T) = ae, TER, 


where 0 < @ < oo is some deterministic real parameter. Compute the probability of error 
of a detector that guesses “H = 0” whenever 


[ roar. 
0 


To what does this probability of error converge when a tends to zero? 


Exercise 26.14 (Antipodal Signaling in Colored Noise). Let s be an integrable signal that 
is bandlimited to W Hz, and let H take on the values 0 and 1 equiprobably. Let the time-t 
value of the received signal (Y(t)) be given by (1 — 2H) s(t) + N(t), where (N(t)) is a 
measurable, centered, stationary, Gaussian SP of autocovariance function Kyn. Assume 
that H and (N(t)) are independent, and that Knw can be whitened with respect to the 
bandwidth W. Find the optimal probability of error in guessing H based on (Y(t)). 


Exercise 26.15 (Modeling Artifacts). Let H take on the values 0 and 1 equiprobably, and 
let the received signal (Y(t)) be given by 


Y(t) = (1—2H) s(t) + N(t), teR, 


where s: t++ I{0 < t < 1} and the SP (N(t)) is independent of H and is a measurable, 
centered, stationary, Gaussian SP of autocovariance function 


for some a, 3 > 0. 


Argue heuristically that—irrespective of the values of a and G—for any € > 0 we can find 
a rule for guessing H based on (Y (t)) whose probability of error is smaller than e. 


Hint: Study 8(f) and Snn(f) at high frequencies f. 
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Exercise 26.16 (Measurability in Theorem 26.3.2). 


(i) 


Let (N(t)) be white Gaussian noise of double-sided PSD No/2 with respect to the 
bandwidth W. Let R be a unit-mean exponential RV that is independent of (N (t)). 
Define the SP 

N(t)=N(t)l{t 4 R}, teER. 
Show that (N(t)) is white Gaussian noise of double-sided PSD No/2 with respect 
to the bandwidth W. 


Let s be a nonzero integrable signal that is bandlimited to W Hz. To be concrete, 
s(t) =sinc?(Wt), teéER. 


Suppose that the SP (N(t)) is as above and that for every w € 2 the sample-path 
t > N(w,t) is continuous. Construct (N(t)) as above. Suppose you wish to test 
whether you are observing s or —s in the additive noise (N (t)). Show that you can 
guess with zero probability of error by finding an epoch where the observed SP is 
discontinuous and by comparing the value of the received signal at that epoch to 
the value of s. (This does not violate Theorem 26.3.2 because this decision rule is 
not measurable with respect to the Borel o-algebra generated by the observed SP.) 


Chapter 27 


Noncoherent Detection and Nuisance 
Parameters 


27.1 Introduction and Motivation 


In this chapter we discuss a problem that arises in noncoherent detection. To mo- 
tivate the problem, consider a setup where a transmitter sends one of two different 
passband waveforms 


t+ 2Re(so,pp(t)e?"") or tr 2Re(sipp(t)e?*!""), 


where So,pp and sj .pp are integrable baseband signals that are bandlimited to W/2 
Hz, and where the carrier frequency f, satisfies f, > W/2. To motivate our problem 
it is instructive to consider the case where 


fc > W. (27.1) 


(In wireless communications it is common for f. to be three orders of magnitude 
larger than W.) Let X(t) denote the transmitted waveform at time t. Suppose 
that the received waveform (Y(t)) is a delayed version of the transmitted waveform 
corrupted by white Gaussian noise of PSD No/2 with respect to the bandwidth W 
around the carrier frequency f, (Definition 25.15.3): 


Y(t) =X(t-tp)+N(t), teR, 


where tp denotes the delay (typically proportional to the distance between the 
transmitter and the receiver) and (N(t)) is the additive noise. Suppose further 
that the receiver estimates the delay to be t and moves its clock back by defining 


¢2t—t. (27.2) 


If Y(t’) is what the receiver receives when its clock shows t’, then by (27.2) 


Y(t’) =Y(t' +t) 
(t' + th —tp) + N(t' + th) 
+h) Eve). £eER 
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where N(t') & N(t! + th) and is thus, by the stationarity of (N(t)), also white 
Gaussian noise of PSD No/2 with respect to the bandwidth W around f.. The 
term X(t’ + th — tp) can be more explicitly written for every t’ € R as 


XC +¢5 tp) = 2Re(sy,nn(t! set tp) ern eter te)), (27.3) 


where v is either zero or one, depending on which waveform is sent. 


We next argue that if 


1 
tb -tp| « Ww’ (27.4) 
then 
sypp(t'+tp —to)  sypp(t’), ER. (27.5) 


This can be seen by considering a Taylor Series expansion for s,,pp(-) around ¢’ 


ds,.BB (7) 


dr (tb 7 tp) 


T=t' 


Sy BB(t + i = tp) i Sy BB(t’) + 


and by then using Bernstein’s Inequality (Theorem 6.7.1) to heuristically argue that 
the derivative of the baseband signal is of order of magnitude W, so its product by 
the timing error is, by (27.4), negligible. 


From (27.3) and (27.5) we obtain that, as long as (27.4) holds, 


X(t’ + th —tp) & 2Re(sv.5B (t’) evanienio- re) ) 


= 2Re(sy.np (’) see), eR, (27.6a) 


where 
6 =2rf.(th —tp) mod [-7,7). (27.6b) 


(Recall that € mod [—7, 7) is the element in the interval [—7, 7) that differs from € 
by an integer multiple of 27.) Note that even if (27.4) holds, the term 27 f.(t) —tp) 
may be much larger than 1 when f, > W. 


We conclude that if the error in estimating the delay is negligible compared to the 
reciprocal of the signal bandwidth but significantly larger than the reciprocal of 
the carrier frequency, then the received waveform can be modeled as 


¥(t’) =2Re(s,,an(t’) Pr") + NX), YER, (27.7) 


where the receiver needs to determine whether vy is equal to zero or one; (N (t’)) 
is additive white Gaussian noise of PSD No/2 with respect to the bandwidth W 
around f.; and where the phase @ is unknown to the receiver. Since the phase is 
unknown to the receiver, the detection is said to be noncoherent. In the statistics 
literature an unknown parameter such as @ is called a nuisance parameter. 


It would make engineering sense to ask for a decision rule for guessing v based 
on (Y(t’)) that would work well irrespective of the value of 6, but this is not the 
question we shall ask. This question is related to “composite hypothesis testing,” 
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which is not treated in this book.! Instead we shall adopt a probabilistic approach. 
We shall assume that # is a random variable—and therefore henceforth denote it 
by © and its realization by 6—that is uniformly distributed over the interval |[—7, 7) 
independently of the noise and the message, and we shall seek a decision rule that 
has the smallest average probability of error. Thus, if we denote the probability 
of error conditional on © = @ by p(error|@), then we seek a decision rule based 
on (Y(t)) that minimizes 
Tv 

es p(error|@) dé. (27.8) 

27 
The conservative reader may prefer to minimize the probability of error on the 
“worst case 6” 


i, 


sup p(error|@) (27.9) 
0€[—-7,7) 
but, miraculously, it will turn out that the decoder we shall derive to minimize (27.8) 
has a conditional probability of error p(error|@) that does not depend on the real- 
ization 6 so, as we shall see in Section 27.7, our decoder also minimizes (27.9). 


27.2 The Setup 


We next define our hypothesis testing problem. We denote time by ¢ and the 
received waveform by (Y(t)) (even though in the scenario we described in Sec- 
tion 27.1 these correspond to t’ and (Y(t')), i.e., to the time coordinate and to the 
corresponding signal at the receiver). We denote the RV we wish to guess by H 
and assume a uniform prior: 


1 
Pr[H = 0] = Pri = 1] = 5. (27.10) 
For each v € {0,1} the observation (Y(t)) is, conditionally on H = v, a SP of the 


form 


Y(t)=S,(t)+N(t), teR, (27.11) 


where (N(t)) is white Gaussian noise of positive PSD No/2 with respect to the 
bandwidth W around the carrier frequency f, (Definition 25.15.3), and where S,(t) 
can be described as 


S_(t) =2Re (s..p (t) genet) 
=2 Re(s,,BB (t) eoniet) cos 8 — 2 Im(s),BB (t) earls?) sin O 
2 Re(s),BB (t) eras) cos © + 2 Re(i Sv.BB (t) ete) sinO 
Sy,-(t) cosO + s,5(t)sinO, teER, (27.12) 


I 


I 


where 9 is a RV that is uniformly distributed over the interval [—7,7) indepen- 
dently of (H, (N(t))), and where we define for v € {0, 1} 


Sv,c(t) = 2Re(s,pp(t)e?""), teR, (27.13a) 
Sy,s(t) = 2Re(isypp(t)e?"*"), teR. (27.13b) 


See, for example, (Lehmann and Romano, 2005, Chapter 3). 
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Notice that by (27.13) and by the relationship between inner products in baseband 
and passband (Theorem 7.6.10), 


(Si,c,8ys) =0, v=0,1. (27.14) 


We assume that the baseband signals so.pR,81,BR are integrable complex signals 
that are bandlimited to W/2 Hz and that they are orthogonal: 


(So,BB, $1,BB) = 0. (27.15) 


Consequently, by (27.13) and Theorem 7.6.10, 


(So,¢> S1,c) = (So,s)S1,c) = (So,c; S1,s) = (So,s;S1,s) =0. (27.16) 


We finally assume that the baseband signals Sp pp and s;,pp are of equal positive 
energy: 
\Iso,pll> = Ilsippl|3 > 0. (27.17) 


Defining? 
Es = 2Iso,zel (27.18) 


we have by the relationship between energy in baseband and passband (Theo- 
rem 7.6.10) 


Es = [[Soll2 = lISillz = Ilsoslls = IIso.clls = Ils1sllo = Ils1clls - (27.19) 


By (27.14), (27.16), and (27.18) 


is an orthonormal 4-tuple. (27.20) 


1 
WE. (So,c, S0,s)S1,c; S1,s) 
s 


Our problem is to guess H based on the observation (Y(t)). 


27.3 A Sufficient Statistic 


To derive an optimal guessing rule, we begin by deriving a sufficient statistic vector. 
This vector takes value in R* and enables us to simplify the guessing problem from 
one where the observation consists of a SP to one where it consists of a random 
4-vector. We shall later find an even more concise sufficient statistic vector with 
only two components. We denote the sufficient statistic vector by T and its four 
components by Toc, Zo,s; Ti,¢, and T1,s: 


= 
T= (To,c: Tos; Tics Tis) : 
We denote its realization by t with corresponding components 


t= (to,c, to,s ties ue 


2The “s” in Es stands for “signal,” whereas the “s” in sj., and s1., stands for “sine.” 
’ 0,8 ‘Ls 
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The vector T is defined by 


: 
v2 ( (v.35), (vy 32), (v.52), (v.5)) (27.21) 


Fe (f° vO elton [VO mal2e) (27.22) 


We now prove that it forms a sufficient statistic for guessing H based on the 
observation (Y(t)). It is interesting to note that this sufficiency also holds for O 
of arbitrary distribution (not necessarily uniform) provided that the pair (H, Q) is 
independent of the additive noise. Moreover, it holds even if the baseband signals 
So,BB and $s; pp are not orthogonal. 


Before proving the sufficiency of T we give a plausibility argument. To that end we 
consider a new (hypothetical) scenario where 0, rather than being uniform, now 
takes value in a finite set {61,...,0,} according to some arbitrary distribution. 
Suppose further that rather than just being interested in H we also wish to guess 
the value of ©. Thus, rather than just guessing H we wish to guess the pair (H,9), 
which takes value in the set 


{(0, 01), (1,1), (0, 02), (1, O2),---,(0, x), (1, Ox) }. 


In this new scenario we have for every v € {0,1} and every 7 € {1,...,«} that, 
conditional on (H,@) = (v,0,), the observation (Y(t)) consists of the signal 
t > Sy,¢(t) cos 0, + $,,(t) sin 8, corrupted by additive Gaussian noise (V(t)). Since 
for every such v and 7 the signal t + s,,.(t) cos 6, + S,,s(t) sin@, can be written 
as a linear combination of the signals So,c, So,s, Si,c, and $1,s, it follows from The- 
orem 26.4.1 that in this new scenario T forms a sufficient statistic for guessing 
the pair (H,®) based on (Y(t)). But what if we are only interested in guess- 
ing H? Guessing H in this scenario reduces to guessing whether the pair (H, 0) 
is in the set {(0,01), (0,02),...,(0,9,)} or in the set {(1,01), (1,02),...,(1,0.)}. 
Consequently, by Proposition 22.4.4, in the new scenario T is also sufficient for 
guessing H. Since « in this argument can be as large as we want, it is plausible 
that T is also a sufficient statistic for guessing H in our original problem where O 
is uniform over [—7, 7). 


The key to the above heuristic argument is that, irrespective of the realization of O 
and of the value of v, the signal S, lies in the four dimensional subspace spanned 
by the signals So, So,s, S1,c, and S15. The sufficiency thus follows from a more 
general theorem that we state next. 


Theorem 27.3.1 (White Gaussian Noise with Nuisance Parameters). Let V be a 
d-dimensional subspace of the set of all integrable signals that are bandlimited to W 
Hz, and let (d1,...,@a) be an orthonormal basis for V. Let the RV M take value 
in a finite set M. Suppose that, conditional on M =m, the SP (Y(t) is given by 


d 
Y(t) = 5_ AM de(t) + N(t), (27.23) 


l=1 
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where A = (AW,...,A©)™ is a random d-vector whose law typically depends 
on m, where the SP (N(t)) is white Gaussian noise with respect to the band- 
width W, and where (N(t)) is independent of the pair (M,A). Then the vector 


T = ((Y,¢1),.-.,(¥, a)" (27.24) 


forms a sufficient statistic for guessing M based on (Y(t)). 


The theorem also holds in passband, i.e., if V is a d-dimensional subspace of the set 
of all integrable signals that are bandlimited to W Hz around the carrier frequency fc 
and if (N(t)) is white with respect to the bandwidth W around fe. 


Note 27.3.2. Theorem 27.3.1 continues to hold even if (@1,...,@a) are not or- 
thonormal, it suffices that they form a basis for V. 


Proof of Note 27.3.2. This follows from Proposition 22.4.2 and from the obser- 
vation that if (uj,...,ua@) forms a basis for V and if (vi,...,va) forms another 
basis for V, then the inner products {(Y,vz)}¢_,; are computable from the inner 
products {(Y, uy) }¢_, (Lemma 25.10.3). 


Before presenting the proof of Theorem 27.3.1 we give two examples of its ap- 
plication. The first is a simple case where, conditional on M, the vector A is 
deterministic. This corresponds to the problem of detecting a known signal cor- 
rupted by additive white Gaussian noise. This case was treated in Theorem 26.4.1 
and slightly generalized in Corollary 26.4.2. We thus see that Theorem 27.3.1 is a 
generalization of Theorem 26.4.1 & Corollary 26.4.2.% 


The second example of the application of this theorem is for the noncoherent de- 
tection problem at hand. Here d= 4 and 


Y = span(so,c; $0,s, $1,c; 81,8); (27.25) 


with $1 © so,-/VEs, 2 = 80,s/VEs, 63 = S1,c/VEs, and o4 = $1,s/VEs. We note 
that, conditional on H = 0, the received waveform (Y(t)) can be written in the 
form (27.23) where A®) & A“) are deterministically zero and the pair (A), A@)) 
is uniformly distributed over the unit circle: 


(AM)? 4 (A@)? =1. 


Similarly, conditional on H = 1, the random variables A“) and A) are determin- 
istically zero and the pair (A), A) is uniformly distributed over the unit circle. 
Thus, once we prove Theorem 27.3.1, it will follow that the vector in (27.22) forms 
a sufficient statistic. 


Proof of Theorem 27.3.1. To derive the sufficiency of T we need to show that for 
every 7 € N and any choice of the epochs ¢1,...,t,, € R the random vector T forms 


3The setup of Corollary 26.4.2 may appear slightly more general than our setting because 
the signals $1,...,Sn are not assumed to be orthonormal. But, using the linearity of the inner 
product (Lemma 25.10.3), it is readily seen that from the inner products (27.24) one can compute 
the inner products {(Y,8;)}7_, and vice versa. 
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a sufficient statistic for guessing M based on (Y(ti),...,Y(t,),T). That is, we 
need to show that, irrespective of the prior distribution of M/, 


M-~T-o—(Y(t1),-.., ¥(tn)). (27.26) 


Define the random variables 


d 
Y (tx) = ¥ (te) — Do deltn) (¥, Ge) (27.27) 
l=1 
d 
=F) => GEO, watch (27.28) 
l=1 


and stack them in a vector Y 4 (Y(t1),...,¥(t,))™. Since, conditional on T, the 
random variables Y(t,,) and Y(t,) only differ by a constant (which depends on T), 
it follows that to prove (27.26) it suffices to prove 


M-o—T-0-Y. (27.29) 
Instead of proving (27.29), we shall prove 
(M, A)---T—-Y, (27.30) 


which implies (27.29). (If the pair (X,Y) is independent of Z, then X is indepen- 
dent of Z. Likewise if we condition on T: if conditional on T the pair (X,Y) is 
independent of Z, then conditional on T we also have that X is independent of Z.) 


By Proposition 22.5.5 it follows that to establish (27.30) it suffices to show that 
Y is independent of (M, A) (27-81) 


and 


T—o—(M, A)-o-Y. (27.32) 


We first prove (27.31) by showing that conditional on (IM, A) = (m,a) the random 
vector Y is Gaussian with a mean vector and a covariance matrix that do not 
depend on m and a. That conditional on (M, A) = (m,a) the random vector Y is 
Gaussian follows because under this conditioning T and Y(t;),...Y(t,) are jointly 
Gaussian (Theorem 25.12.1) so the result of linearly transforming them to form Y 
must also be Gaussian (Proposition 23.6.3). For the mean we have from (27.27) 


E[Y’ (tx) | UM, A) = (m,a)] 
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d 
= Soa de(te) — S> dete) a 
l=1 


=0, «K€ ({l,...,n}, 


where the first equality follows from the definition of Y(t,.); the second from the 
linearity of conditional expectation; the third because (N(t)) is of zero mean; and 
the fourth from the orthonormality of (@1,...,@a). We thus conclude that for 
every m € M and every a € R%, 


E[Y | (/, A) = (m,a)] = 0. (27.33) 


Likewise, the conditional covariance matrix of Y given (M,A) = (m,a) does not 
depend on the value of m and a: it is the covariance matrix of (N(t1),..., N(t»))!. 
By establishing that, conditional on (M, A) = (m,a), the vector Y has a multivari- 
ate Gaussian distribution whose mean vector and covariance matrix do not depend 
on (m,a) we have established (27.31). 


We next prove (27.32). By Theorem 25.12.1, we have that, conditional on (M, A), 
the random vectors T and Y are jointly Gaussian. To establish that they are 
conditionally independent given (MV, A) it thus suffices to establish that they are 
conditionally uncorrelated (Proposition 23.7.3). We now proceed to compute their 
conditional covariance and show that it is zero. Since the conditional mean of Y 
is zero (27.33), it follows that we need to show that 


e| (7 — E[T | (M, A) = (m,a)])¥(tx) | (MA) = (m,a)| =0, 


meéeM, acR’, £e€ {l,...,d}, ee {1,...,n}. (27.34) 


Before embarking on this calculation, we make two preliminary algebraic ma- 
nipulations. The first entails using (27.23), (27.24), and the orthonormality of 
(d1,.--,@a) to express T™ as 


T® =AO+(N, ge), €=1,...,d. (27.35) 
This representation makes it clear that 
T® — E[T™ | (M, A) =(m,a)] =(N,¢e), €=1,...,4. (27.36) 
The second manipulation involves rewriting Y(t,) using (27.23) and (27.27) as: 


d 
¥ (tx) =¥ (te) — So bor (te) TO 


f=, 


d d 
= ¥° AM $0(te) + N(te) — > be (te) TO 


v=1 SL 
d 
= N(tk) — So (PO — AM) be (te) 
L yi 
= N(te) — D7 (IN, Ger) belt), KE {1,..-sm}, (27.37) 


= 
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where the first equality follows from the definition of Y(t,) (27.27); the second 
from (27.23); the third by rearranging terms; and the final equality from (27.35). 


It follows from (27.36) and (27.37) that to establish (27.34) it suffices to show that 
for every €€ {1,...,d} and « € {1,...,7} 


c(i. do (te) - 3° ede) de(t))] <0 (27.38) 


@=1 


This follows from Proposition 25.15.2 and the orthonormality of (@1,...,@a): 


Eli, be) (vt yr N, de) der(t 0) 


L=1 
d 

= E|(N, dc) N(tx)] — S- bo (txJE[(N, be) (N, bv)] 

L=1 

> x belts xs, je a He= 0} 
=i 

ee te 
= 0. 


Combining (27.38) with (27.36) and (27.37) establishes (27.34), i-e., that for every 
méeM,acR%, 2e {1,...,d}, and « € {1,...,} 


Cov, ¥(ty) (M, A) =(m,a)| =0. (27.39) 


This combines with the conditional joint Gaussianity of vectors T and Y given 
(M,A) to establish (27.32). The combination of (27.32) and (27.31) implies 
(27.30), which implies (27.29). Since (27.29) is equivalent to (27.26), this estab- 
lishes the theorem for baseband signals. 


For passband signals the proof is almost identical except that in deriving (27.38) 
we use Note 25.15.4 instead of Proposition 25.15.2. 


27.4 The Conditional Law of the Sufficient Statistic 


Having established in the previous section that the vector T defined in (27.21) 
forms a sufficient statistic for guessing H based on (Y(t), we next proceed to 
calculate its conditional distribution given H. This will allow us to compute the 
likelihood-ratio ft) 7=0(t)/frjz=1(t) and to thus obtain an optimal guessing rule. 


Rather than computing the conditional distribution directly, we begin with the 
simpler conditional distribution of T given (H,O). Conditional on (H,9), the 
vector T is Gaussian (Theorem 25.12.1). Consequently, to compute its conditional 
distribution we only need to compute its conditional mean vector and covariance 
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matrix, which we proceed to do. Conditional on (H,®) = (v,@), the observed 
process (Y(t)) can be expressed as 


Y(t) = s,,<(t)cos@ + sys(t)sind + N(t), teR. (27.40) 


Hence, since (N(t)) is of zero mean, we have from (27.22) and (27.20) 


E[T | (H,@) = (0,4)] = VEs (cos 0,sin ,0,0) (27.41a) 
E[T | (H,©) = (1,0)] = VE; (0, 0, cos 0, sin a). (27.41b) 


as we next calculate. The calculation is a bit tedious because we need to compute 
the conditional mean of each of four random variables conditional on each of two 
hypotheses, thus requiring eight calculations, which are all very similar but not 
identical. We shall carry out only one calculation: 


1 
E[To,¢ | (H, ©) = (0,4)] = TE ((s0. cos 6 + Sos Sin 8, S0,c) + E[(N, s0.)l) 
1 
= (So, CoS @ + Sos Sin 8, So,c) 
VEs 


1 
Je (IIso.cll2 cos # + (So,s,80,c) Sin 6) 
= VE, cos, 


where the first equality follows from (27.40); the second because (N(t)) is of zero 
mean (Proposition 25.10.1); the third from the linearity of the inner product and 
by writing (So,c,S0,c) as IIso,cll33 and the final equality from (27.20). 


We next compute the conditional covariance matrix of T given (H,9) = (v,@). By 
the orthonormality (27.20) and the whiteness of the noise (Proposition 25.15.2) we 
have that, irrespective of v and @, this conditional covariance matrix is given by 
the 4 x 4 matrix (No/2)l4, where I4 is the 4 x 4 identity matrix. 


Using the explicit form of the Gaussian distribution (19.6) and defining 


N 
es oe (27.42) 


we can thus write the conditional density as 


frjH=0,0=0(t) 


3 et ex( ((to. VEs Cos 6)” + (toe - VE, sin 0)° +t7 + #,)) 


= exp ( Eg to " ty ) 
— 202 


mae VEs to,< 0088 + E/E stosin8), t € Rt, (27.43) 
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where the second equality follows by opening the squares, by using the identity 
cos? @ + sin? @ = 1, and by defining 


2 2 
Lie =f Log 
2 7 


2 2 
1 he ts 
2 bd 


I> 


T= to (27.44a) 


(on 
2 2 
Tie oe Tihs 
2 


Oo 
2 2 
tie a2 ts 
a 


I> 
I> 


1 ty (27.44b) 


oO oO 


(We define To and T; not only to simplify the typesetting but also for ulterior 
motives that have to do with the further reduction of the sufficient statistic from 
a random vector of four components to one with only two, namely, the vector 


(To,T1)".) 

To derive f-pj=0(t) (unconditioned on ©) we can integrate out ©. Thus, for every 
Ts 

t= (to,c, tos the, t1,s) im R4 


7 


frjH=o0(t) = fo|n=0(9) fr)=0,0=0(t) 40 


—T 


One 
ml, fr|H=0,e=0(t) dé 


e7 Es/(207) e7t1/2 e-to/2 


= ae 

fo f™ 1 1 ; 
x | exp (SVE tac cos + 7 VE-toqsind dé 
1 


e7Es/ (207) e7 (tott1)/2 


~ (2n02)2 


Yel, on (Vt = Vig cos(8 —tan™ Mioalae))) a 


a en Ee/(20”) 9 (tott)/2 
Tat 


1 7—tan~ 1(to, is/to,c) 


— exp Fs ig cos) dw 
2m J 7—tan- 1(to,s/to,c) 
1 ~ o ae jE 
= Gna? e E,/(20°) e —(tot+t1)/2 =| exp (V: — Vio cos #) a 
1 = Py oe (he 4: 
= Ono2e e7 Es/(20°) e- (ti +to)/2 J u(y pas Sve), (27.45) 


where the first equality follows by averaging out 0; the second because O and H 
are independent; the third because © is uniform; the fourth by the explicit form of 
fr) H=0,e=0(t) (27.43); the fifth by the trigonometric identity 


acos 6+ Bsiné = v/a? + 8? cos(6 — tan™'(3/a)); (27.46) 
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the sixth by the change of variable ~» + 6 — tan7'(tos/to,-); the seventh from 
the periodicity of the cosine function; and the final equality by recalling that the 
zeroth-order modified Bessel function Ip(-) is defined by 


1 Tv 
Ip(€) = se e605? dg (27.47) 
1 Tv 
= -{ e& So dd 
0 
1 m/2 
= -| (e8 os? + e 08%) dd, EER. (27.48) 
0 


By symmetry, 


1 <7 o2 — Es 
frjn=i(t) oe Ono)p € Es/(2 de (to+t1)/2 a [Eval te R¢. (27.49) 


27.5 An Optimal Detector 


By (27.45) and (27.49), the likelihood-ratio is given by 


frjn=o(t) _ i: (Vive) 
frjH=1(t) to(\/Sevan) 


which is computable from to and t;. This proves that the pair (Zo, 71) defined in 
(27.44) forms a sufficient statistic for guessing H based on T (Definition 20.12.2). 
Having identified (T),7T\) as a sufficient statistic, we now proceed to derive an 
optimal decision rule using two different methods. The first method, which is 
summarized in (20.79), ignores the fact that (Zo,71) is sufficient and proceeds to 
base the decision on the likelihood-ratio of T (27.50). The second method, which 
is summarized in (20.80), bases the decision on the likelihood-ratio of the pair 
(To, Ti). 


t € R4, (27.50) 


Method 1: Since we assumed a uniform prior (27.10), an optimal decision rule 
is to guess “Hl = 0” whenever fj7=0(t)/frjH=i(t) > 1, which, by (27.50) is 


equivalent to 
‘ E, — 
Guess “H = 0” if Io 5 V to = Io svt . (27.51) 
ol or 


This rule can be further simplified by noting that Ip(€) is (strictly) increasing in € 
for € > 0. (This can be verified by computing the derivative from (27.48) 


nm /2 
a = =f cos  (e6 8? — e~ $s?) do 
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and by noting that for € > 0 the integrand is positive for all ¢ € (0, 7/2).) Conse- 
quently, the function £ +> Ip (VE) is also (strictly) increasing and the guessing rule 
(27.51) is thus equivalent to the rule 


Guess “H = 0” if to > ty. (27.52) 


In terms of the observable (Y(¢)) this can be paraphrased using (27.44) and (27.22) 
as guessing “H = 0” whenever 


VV. ae we) 5 (a Y (t) Re(i so,pa(t) e?7!*") at) 


—oCo 


> ( / 7 Y(t) Re(si,pp(t) e?/**) a) + ( / : Y(t) Re(is1,8n(t) e2"/*) a). 


Method 2: We next obtain the same result by considering the likelihood-ratio 
function of the sufficient statistic (To, 71) 


ft ,7,\H#=0(to, t1) 


ft» ,7,\H=1 (tos ti) 


We begin by arguing that, conditional on H = 0, the random variables To, 7}, 
and © are independent with 


1 
Ft9,7,,0|H=0(to, t1, 9) = on yay (to) ag (4) (27.53) 


where f,2 (x) denotes the density at x of the noncentral x? distribution with n 
degrees of freedom and noncentrality parameter (Section 19.8.2), and where 


Es 
do =0 and Ay =-->- (27.54) 


o2 


To prove (27.53) we compute for every to,t; € R and 0 € [—7, 7) 


fry,7,,0|H=0(to, t1,9) = fojx=o0() fry 7,\4=0,0=0 (to, t1) 
1 
= 2 fry,7,\H=0,0=0(to, t1) 
21 


1 
la fro|H=0,0=0(to) fr, |4=0,e=0(t1) 


1 
= 57 Fa, (to) faa, (t), 


where the first equality follows from the definition of the conditional density; the 
second because 9 is independent of H and is uniformly distributed over the interval 
[—1,7); the third because, conditional on (H,9) = (0,0), the random variables 
To,c; 10,8, T1,c;Ti,s are independent (Section 27.4), and because Tp is a function 
of (To,c,Zo,s) whereas T, is a function of (T1,,Ti,s) (see (27.44)); and the final 
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equality follows because, conditional on (H,®) = (0,6), the random variables 
Tp,c; To,s; Ti,c; Ti,s are variance-o? Gaussians with means specified in (27.41a) (Sec- 
tion 19.8.2). 


Integrating out 6 in (27.53) we obtain that, conditional on H, the random variables 
To and T; are independent with 


Fity.7y\#=0(t0,t1) = Fg (to) fg (ti) (27.55a) 
fron ju=i tot) = fg, (to) Ag, A), (27.55b) 


where the expression for f7, 7,)7=1(to, t1) is obtained using analogous steps. 


Since H has a uniform prior, an optimal decision rule is thus to guess “H = 0” 
whenever 
13, ,,(to) fg, (4) 2 fg, (to) fz, (a). 


Since A, > Ao, this will hold, by Proposition 19.8.3, whenever to > t,. And by the 
same proposition the inequality 


Fy, (to) Fag (41) S fog, (to) fag, ) 


will hold whenever to < t;. It is thus optimal to guess “H = 0” whenever tg > ty 
and to guess “H = 1” whenever tp < t;. (It does not matter how we guess when 
to = t,.) The decision rule (27.52) has thus been recovered. 


27.6 The Probability of Error 


In this section we compute the probability of error for the optimal guessing rule 
(27.52). Since the probability of a tie (i.e., of Ty) = T)) is zero both conditional on 
AH =0 and conditional on H = 1, we shall analyze a slightly simpler guessing rule 
that guesses “H = 0” if Ty > T;, and guesses “H = 1” if T, > To. 

We begin with the conditional probability of error given that H = 0, i.e., with 
Pr[Zi > To|H = 0}. Conditional on H = 0, the question of whether our decoder 
errs depends prima facie not only on the realization of the additive noise (V(t) 
but also on the realization of 0. But this is not the case because, conditionally on 
H =0, the pair (To,71) is independent of © (see (27.53)), so the realization of O 
does not play a role in the sense that for every @ € [—7,7) 


Pr[T, > To| H = 0,0 = 6] = Pr[T, > T|H =0,0 =O]. (27.56) 
Conditional on (H, ©) = (0,@) we have by (27.53) that Ty and T; are independent 


with To ~ x3.y, and with T; ~ x3), ie., with T, having a mean-2 exponential 
distribution (Note 19.8.1) 


Re 


ty 


fr,\H=0,0=0(t1) = 5 e?, 20. 


Consequently, for every 6 € [—1,7) and € > 0, 


Pr[T, >€|H =0,0 =6] = | a dt = e7§/? (27.57) 
g 
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Starting with (27.56) we now have for every 0 € [—7, 7) 
Pr[T, > T>|H =0,0 =6] 
= Pr[T, > T)| H =0,0 =0] 


= | fito|#=0,0=0(to) Pr[Ti > to | H = 0,90 =0, Tp = to] dto 
0 

= ii fr|4=0,0=0(to) Pr[Ti > to | H = 0,0 = 0] dto 
0) 


=| fr H=0,0=0(to) e*°” dto 
0 


= E|es7 P=0-62 0| 
s=—1/2 
= ree ?) s=-1/2 
eee 
=5 eae, (27.58) 


where the first equality follows from (27.56); the second from (26.88); the third 
because conditional on H = 0 (and © = 0) the random variables Tp and T; 
are independent; the fourth from (27.57); the fifth by expressing [ fz(z) 9(z) dz as 
E[g(Z)] (with g(-) the exponential function); the sixth by the definition of the MGF 
(19.23) and because, conditional on H = 0 and © = 0, we have that To ~ X3,E,/023 
and the final equality from the explicit expression for the MGF of a Noe /o? RV, 
i.e., from (19.45) with the substitution n = 2 for the number of degrees of freedom, 
\ =E;/o? for the noncentrality parameter, and s = —1/2. 


By symmetry we also have for every 0 € [—7, 7) 


Es 
4 


Pr[Jo > 1%) |H =1,0=9] =peur, (27.59) 


Thus, if we denote by pyap(error|O = @) the conditional probability of error of 
our decoder conditional on © = 6, then by the uniformity of the prior (27.10) and 
by (27.58) & (27.59) 


pmap(error|O = @) 
= Pr|H = 0] pmap(error|H = 0,0 = @) + Pr[H = 1] pmap(error|H = 1,0 = @) 
1 1 
=5Pr[ 2h |H=0,0=9] +5Pr(% 21 |H=1,0=9] 


1 Sey 
= seu, 6 €[-n,7). (27.60) 


Integrating (27.60) over @ yields the optimal unconditional probability of error 


eae, (27.61) 


Using (27.42), this can also be expressed as 


Es 


ales, 2: 
p* (error) = ae 2No , (27.62) 
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27.7 Discussion 


The detector we derived has the property that its error probability does not depend 
on the realization of the nuisance parameter 9; see (27.60). This property makes 
the detector robust with respect to the distribution of ©: since the conditional 
probability of error does not depend on the realization of ©, neither does the 
average performance depend on the distribution of ©. (Of course, if © is not 
uniform, then our decoder need not be optimal.) 


We next show that our guessing rule is also conservative in the sense that it mini- 
mizes the worst-case performance: 


sup p(error|O = 6). 
0€[—7,7) 


That is, for any guessing rule of conditional error probability p’(error|O = 6) 


1 4 
sup p’(error|O = 0) > sup pmap(error|O = @) = — ea (27.63) 
0€[—17,7) 0€[—17,7) 2 
Thus, while other decoders may outperform our decoder for some realizations of O, 
for other realizations their probability of error will be at least as high. Indeed, if 
p (error|O = 6) is the conditional probability of error associated with any guessing 
rule, then 
1 T 
sup p’(error|O = 0) > — p' (error|O = 6) da 
0E[-7,7) 20 —T 


5 | pwarterror|® = 0) dd 


IV 


= sup pmap(error|O = 6) d6 


0€[—7,7) 
— Es 
SS 40? , 


where the first inequality follows because the average (over 9) can never exceed the 
supremum; the second inequality because the decoder we designed minimizes the 
unconditional probability of error; and the last two equalities follow from (27.60), 
i.e., from the fact that the conditional probability of error pymap(error|O = 0) of 
our decoder does not depend on @ and is equal to the RHS of (27.60). 


It is interesting to assess the degradation in performance due to our ignorance 
of ©. To that end we now compare the performance of our detector with that 
of the “coherent detector.” The coherent decoder is an optimal decoder for the 
setting where the realization of O is known to the receiver, i.e., when the receiver 
can form its guess based on both (Y(t)) and ©. If the receiver knows © = 0, then it 
can compute So and Sj, and the problem reduces to the problem of deciding which 
of two equi-energy orthogonal waveforms So and S$, is being observed in white 
Gaussian noise (the binary version of the problem we discussed in Section 26.11.3). 
An optimal decision rule would be 


guess “H = 0” it [ Y(t) So(t) dt >| Y(t) Sy (€) dt 
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with resulting probability of error (see (26.93)) 


|So — Si Bene) 


Decvieeent (error|O = — 2) =9Q 


E, 
-<(/5) 


1 i ( Es ) Es 
~~ xX 
\/ TE, /o? 4o?/]? Gg? 


where the approximation follows from (19.18). Integrating over @ we obtain 


> 1, (27.64) 


. ( ) 1 ( EZ ) Es 
~oherent (error) ex 
Pcoh t TE, /o2 p 4o2 


Comparing (27.65) with (27.61) we see that if E,/o? is large, then we pay only 
a small penalty for not knowing the phase.4 Of course, if the phase were known 
precisely we mights have used antipodal signaling with the resulting probability of 
error being lower; see (26.72).° 


> 1. (27.65) 


o2 


27.8 Extension to M > 2 Signals 


We next briefly address the M-ary version of the problem of noncoherent detec- 
tion of orthogonal signals. We now denote the RV to be guessed by M and re- 
place (27.10) with the assumption that M is uniformly distributed over the set 
M = {1,...,M}, where M > 2. We wish to guess the value of M based on the 
observation (Y(t)) (27.11), where v now takes value in M and where the orthog- 
onality conditions (27.15) & (27.18) are now written as 


(Sv’,BB; Sy”",BB) = ae I’ =v"}, viv" eM. (27.66) 
We first argue that the vector 
(Beiaate (27.67) 
forms a sufficient statistic, where, in analogy to (27.44), we define 


T?.4+T? 
T= Vc 5 = i M, 


Oo 


and where 


Tre = (¥, FE) and Tua = (Ys FE) veM. 


To this end, we first note that it is enough that we show pairwise sufficiency 
(Proposition 22.3.2). Pairwise sufficiency can be proved using Proposition 22.4.2 


“Although p*(error)/p*.),.,ont (error) tends to infinity, it does so only subexponentially. 
5 Comparing (26.93) and (26.72) we see that, to achieve the same probability of error, binary 


orthogonal keying requires twice as much energy as antipodal signaling. 
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because for every m’ #4 m” in M our analysis of the binary problem shows that 
the tuple (Tin, Tm) forms a sufficient statistic for testing between m’ and m”, and 
this tuple is computable from the vector in (27.67). 


Our analysis of the binary case shows that, after observing (Ye), the a posteriori 
probability of the event M = m is larger than the a posteriori distribution of the 
event M = m’ whenever T,, > Tin’. Consequently, Message m has the highest a 
posteriori probability if Ty, = maxmem Tm. Thus, the decision rule 


Guess “M = m” if Ty, = max Typ: (27.68) 
m'EM 


is optimal. The probability of a tie is zero, so it does not matter how ties are 
resolved. 

We next turn to the analysis of the probability of error. We shall assume that 
a tie results in an error, so, conditional on M = m, an error occurs whenever 
max{T},...,;Zm-—1,Tm+1;---;Tm} > Tm. We first show that, as in the binary 
case, the probability of error associated with this guessing rule depends neither on 
the realization of © nor on the message, i.e., that for every m € M and @ € [—7,7) 


pmap(error|M = m, 9 = 0) = pmap(error|M = 1,0 = 0). (27.69) 


To see this note that, conditional on (M,0) = (m,6), the components of the vec- 
tor (27.67) are independent, with the m-th component being X3.E, Jo? and with the 
other components being x50 Consequently, irrespective of 9 and m, the condi- 
tional probability of error is the probability that a Xo, /o? RV is exceeded by, or 
is equal to, at least one of M — 1 IID x30 random variables that are independent 
of it. In the analysis of the probability of error we shall thus assume that M = 1 
and that 6 =0. 

The probability that the maximum among the random variables To, ..., 7) exceeds 
or is equal to € is given for every € > 0 by 


Pr[max{T>,...,Tu} > €|M =1,0 =0] 
= 1 — Pr[max{T2,...,T}<é|M=1,0 =0] 
=1—Prl/h<6...4Im <é|M =1,0=0] 
=1- (Pr[T} <€|M=1,0=0])”” 
ope" 
M-1 
Le cap( ‘) e 58/2, (27.70) 


j=0 


where the first equality follows because the probabilities of an event and of its 
complement sum to one; the second because the maximum is smaller than € if, 
and only if, all the random variables are smaller than €; the third because, con- 
ditionally on M = 1 and O = 0, the random variables 75,...,T7y)y4 are IID; the 
fourth because conditional on M = 1 and 0 = 0, the RV 75 is a mean-2 exponen- 
tial (Note 19.8.1); and the final equality follows from the binomial formula (26.91) 
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with the substitution a = 1, b = —e~§/?, and n = M—1. The probability of error 
is thus: 


Pr[max{T2,...,Tu} >7,|M =1,0=9] 
= Pr[max{T2,...,Tm} >T|M=1,0 =0] 


= | fir,|m=1,e=0(t1) Pr[max{T2,...,Tm} >t |M=1,0=0,% =t] dt 
0 


=> | fr,\m=1,0=0(t1) Pr[max{Tp, ee ,Tm} > ty | M= 1, O= 0] dt, 
0 


= ie ya ete (1 2 en (“; ) eee) dty 


j=0 
M-=1 
(M—-1\ a 
=1- (-1P( i )/ fryjm=1,e=0(t1) e- 2"? dty 
j=0 2 
M-=1 
M-1 
= -1P( eles M=1,0=0] 
7=0 J s=—j/2 
M-=1 
-(M-1 
=e -1P( | ) ce 
j=0 J 2Es/o? s=—j/2 
M-=1 
=1- cay (“o") ~ a ae, 
a0 J gob 


where the justifications are very similar to the justifications of (27.58) except that 
we use (27.70) instead of (27.57). Denoting the probability of error by p* (error) 
and noting that for 7 = 0 the summand is 1, we have 


Deh Sect Maly «A es 
p* (error) = (1p : ) : € FT 0? , (27.71) 
yy J j+l 


or, upon recalling that o? was defined in (27.42) as No/2, 


M-1 
p’ (error) = (1) ( 


j= 


Ms ‘) 1 eh (27.72) 
J Oy, 7 
j Jj+l 


B 


27.9 Exercises 


Exercise 27.1 (The Conditional Law of the Sufficient Statistic). Conditional on M =m, 
are the components of the random vector T in Theorem 27.3.1 independent? What about 
conditional on (M, A) = (m,a) form € M and a € R4? 


Exercise 27.2 (A Silly Design Criterion). Let p(error|O = @) denote the conditional 
probability of error given O = @ of some decision rule for the setup of Section 27.2. Show 
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that 


Es 
inf p =0)> : 
oink, pleror| © )> o( N 
Can you think of a detector that achieves this bound with equality? Would you recom- 
mend using it? 


Exercise 27.3 (A Coherent Detector for an Incoherent Channel). Alice designs a coherent 
detector for the setup of Section 27.2 by pretending that © is deterministically equal to 
zero and by then using the results on the detection of known signals in white Gaussian 
noise. Show that if her detector is used over our channel where O ~ Uu([-7, T)), then the 
resulting average probability of error (averaged over Q) is 1/2. 


Exercise 27.4 (Noncoherent Antipodal Signaling). Show that if in the setup of Sec- 
tion 27.2 the baseband signals so,3p and si,3B—rather than orthogonal—are antipodal 
in the sense that so,sB = —Si,BB, then the optimal probability of error is 1/2. 


Exercise 27.5 (A Fading Scenario). Consider the setup of Section 27.2 but with (27.11) 
replaced by Y(t) = AS.(t) + N(t), where A is a Rayleigh RV that is independent of 
(H, O, (N(t))). Find an optimal detector and the associated probability of error when A 
is observed by the receiver. Repeat when A is unobserved. 


Exercise 27.6 (Uniform Phase Noise Is the Worst Phase Noise). Consider the setup of 
Section 27.2 but with © not necessarily uniformly distributed over [—7,7). Show that 
the optimal probability of error is upper-bounded by the optimal probability of error 
corresponding to the case where © ~ U([—7, 7)). 


Exercise 27.7 (Unknown Frequency-Selective Channel). Let H take on the values 0 and 1 
equiprobably, and let s be an integrable signal that is bandlimited to W Hz. When H = 0 
the transmitted signal is s, and when H = 1 it is —s. Let U take on the values {up, down} 
equiprobably and independently of H. When U = up the transmitted signal is passed 
through a stable filter of impulse response h,,; when U = down it is passed through a stable 
filter of impulse response ha. At the receiver, white Gaussian noise (N(t)) of PSD No/2 
over the bandwidth W is added to the received signal. The noise is independent of (H, U). 
Based on the received waveform (Y(t)), the receiver wishes to guess H. The receiver has 
no knowledge of the realization of the switch U. 


(i) Find a two-dimensional sufficient statistic vector (T1,T2)' for this problem. 


(ii) Find a decision rule that minimizes the probability of error. Express your rule 
using the function O(x, y3 02,02, 1p), which is the value at the point (x,y) of the 
joint density of the zero-mean jointly Gaussian random variables X, Y of variances 
oz and oy and covariance E[XY] = ozcyp. 


Exercise 27.8 (Noncoherent Detection with Two Antennas). Consider the setup of Sec- 
tion 27.2 but with the signal now received at two antennas. Denote the received signals 
by (Yi(t)) and (Y2(t)) 


Y(t) = 2 Re(s.,nn(t) aoa) +Ni(t), teER, 


Y2(t) = 2Re(s,,pn(t) ee) +N2(t), teR, 


where the additive white noises (Ni(t)) and (N2(t)) at the two antennas are independent. 
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(i) Suppose that the random phase at the two antennas ©; and O2 are unknown but 
identical. Find an optimal detector and the optimal probability of error. 


(ii) Assume now that ©; and ©2 are independent. Find an optimal guessing rule for H. 


Exercise 27.9 (Unknown Polarity). Consider the setup of Section 27.2 but with © now 
taking on the values —7 and 0 equiprobably. 


(i) Find an optimal decision rule for guessing H. 


(ii) Bob suggests accounting for the random phase as follows. Pretend that the trans- 
mitted signal is drawn uniformly from the set {+so,c, +si,-} and that it is observed 
in white Gaussian noise. Feed the received signal to an optimal receiver for guessing 
which of these four signals is being observed in white Gaussian noise, and if the 

” 6 


receiver produces the guess “so,” or “—so,-”, declare “H = 0”; otherwise declare 
“H = 1”. Is Bob’s receiver optimal? 


Exercise 27.10 (Additional Channel Randomness). Consider the setup of Section 27.2 
but when the observed SP (Y(t), t € R), rather than being given by (27.11), is now given 
by 

Y(t) = SL(t) + AN(t), tER, 
where A is a positive RV that is independent of (A, O, (N(t))). Find an optimal decision 
rule when A is observed. Repeat when A is not observed. 


Exercise 27.11 (Mismatched Noncoherent Detection). Suppose that the signal fed to 
the detector of Section 27.5 is 


2Re (uen(t) en) +N(t), teR, 


where upp is an integrable signal that is bandlimited to W/2 Hz and that is orthogonal 
to So,pp, and where the other quantities are as defined in Section 27.2. Compute the 
probability that the detector produces the guess “H = 0.” Express your answer in terms 
of the inner product (upp,si,BB), the energy in upp, and No. 


Chapter 28 


Detecting PAM and QAM Signals in White 
Gaussian Noise 


28.1 Introduction and Setup 


In Chapter 26 we addressed the problem of detecting one of M bandwidth-W sig- 
nals corrupted by additive Gaussian noise that is white with respect to the band- 
width W. Except for assuming that the mean signals are integrable signals that 
are bandlimited to W Hz, we made no assumptions about their structure. In this 
chapter we study the implication of the results of Chapter 26 for Pulse Amplitude 
Modulation, where the mean signals correspond to different possible outputs of a 
PAM modulator. The conclusions we shall draw are extremely important to the 
design of receivers for systems employing PAM. 


The most important result of this chapter is that, loosely speaking, for PAM signals 
contaminated by additive white Gaussian noise, the inner products between the 
received waveform and the time shifts of the pulse shape by integer multiples of the 
baud period T; form a sufficient statistic. Thus, if we feed the received waveform to 
a matched filter that is matched to the pulse shape defining the PAM signals, then 
the matched filter’s outputs sampled at integer multiples of the baud period T, 
form a sufficient statistic (Theorem 5.8.2). Using this result we can reduce the 
guessing problem from one with an observation consisting of a continuous-time 
stochastic process to one with an observation consisting of a discrete-time SP. 
In fact, since we shall only consider the problem of detecting a finite number of 
data bits, the reduction will be to a finite number of random variables. This will 
justify the canonical structure of a PAM receiver where the received continuous- 
time waveform is fed to a matched filter whose sampled output is then used by the 
decision circuitry to produce its guess. We shall derive the results first for PAM 
and then briefly describe their extension to QAM in Section 28.5. 


The setup we study is one where k data bits D,,..., Dx, are mapped by an encoder 
vy: {0,1}* > R” to the real symbols X1,..., Xn, which are then used to produce 
the transmitted waveform 


X(t)=AS— X,9(t- 41), tER, (28.1) 
f=1 
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where A > 0 is a scaling constant; T, > 0 is the baud period; and g(-) is the pulse 
shape, which is assumed to be a real integrable signal that is bandlimited to W 
Hz. The received waveform (Y(t)) is given by 


Y(t) = X(t) + N(t) 
= 9 X.9(t — Ts) + N(t), teR, (28.2) 
l=1 


where (N(t)) is white Gaussian noise of PSD No/2 with respect to the band- 
width W and is independent of the data bits D,,...,D, and hence also of (X(t). 
Based on the received waveform (Y(t)) we wish to guess the data bits D1,..., Dx. 


To simplify the typesetting we shall stack the k data bits D,,..., Dy in a vector 


D = (Dy,..., Dx)", (28.3) 
stack the n symbols X1,...,X, in a vector 
XS (Ninian Xa) (28.4) 
and write 
X = y(D). (28.5) 


We denote the transmitted waveform corresponding to the realization D = d by 


x(t;d) =A aeg(t- £15), tER, (28.6) 
l=1 


where (21,...,2%n)' = y(d) is the real n-vector to which d is mapped by ¢(-). 
Thus, conditional on D = d, 


Y(t)=a(t;d)+N(t), teR. (28.7) 


28.2 Sufficient Statistic and Its Conditional Law 


We can view the vector D = (Dj,...,D,)' as a message and view the 2” different 
values it can take as the set of messages. To promote this view we define 


D & {0,1}* (28.8) 


to be the set of all 2* binary k-tuples and view D as the set of possible messages. 
While in Chapter 21 on multi-hypothesis testing we always denoted the set of 
messages by M and assumed that its elements are the integers 1,...,M, we never 
attached a meaning to the “labels” we associated with the messages. So there is no 
harm in now labeling the messages by the binary k-tuples. Associated with every 
message d € D is its prior 7a 


Ta = Pr[D = d] 
= Pr[D, = dy, a Ha Dy, = dr], deD. (28.9) 
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If we assume that the data bits are IID random bits (Definition 14.5.1), then 
Ta = 27" for every k-tuple d € D, but this assumption is inessential to our 
derivation of the sufficient statistic. (Recall that sufficiency is defined for a family 
of conditional distributions; the prior plays no role.) 


Conditional on D = d, the transmitted waveform is given by x(-;d); see (28.6). 
Thus, the problem of guessing D is equivalent to guessing which of the 2” signals 


{t = w(t;d)} (28.10) 


deD 
is being observed in white Gaussian noise of PSD No/2 with respect to the band- 
width W. From (28.6) it follows that for every message d € D the transmitted 
waveform t + 2(t;d) is a (deterministic) linear combination of the n functions 
{t > g(t — €Ts)}7_,. Moreover, if the pulse shape g(-) is an integrable function 
that is bandlimited to W Hz, then so is each waveform t +> x(t;d) . Consequently, 
from Corollary 26.4.2 and from (26.23) we obtain: 


Proposition 28.2.1 (Sufficient Statistic for PAM in White Noise). Let the con- 
ditional law of (Y(t)) given D = d be given by (28.5), (28.6), and (28.7), where 
the pulse shape g is a real integrable signal that is bandlimited to W Hz, and 
where (N(t)) is white Gaussian noise of PSD No/2 with respect to the band- 
width W. Then the n inner products 


r= f Y(t) g(t —@T,) dt, @¢ {1,...,n} (28.11) 


form a sufficient statistic for guessing D based on (Y(t)). 


Moreover, conditional on D = d, the vector T = (TY), sins FONT is a Gaussian 
n-vector whose ¢-th component T is of conditional mean 


E[r® | p= d| = ADH Reg((€—-“)Ts),  €€ {1,...,n} (28.12) 


and whose conditional covariance matrix is 


Reg (0) Reg (Ts) a i =A) 
No Ree (Ts) Ree (0) se R(T, 
rae es sas bes (28.13) 
2 eee eee eee eee 
Reg ((n—1)Ts) Reg((n—2)Ts) - Reg (0) 
one 
Cor), re D= d| = “ Ree ((l’—2")Ts), @ 0" {1,...,n}. (28.14) 


Here Rgg is the self-similarity function of the real pulse shape g (Definition 11.2.1), 
and (x1,...,2n)' = y(d) is the real n-tuple to which d is encoded. 


Proof. This follows directly from Corollary 26.4.2 and from (26.23) upon substi- 
tuting the mapping t+ g(t — @T,) for 8; and upon computing the inner product 


(tr g(t — €Ty),t > g(t — CTs)) = Reg ((€-2)TS), 60 € Z. 
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28.3. Consequences of Sufficiency and Other Optimality Criteria 


The sufficiency of the random vector T = (T™,...,7™)™ and Theorem 26.3.2 
guarantee that if our design objective is to minimize the probability of a message 
error, then there is no loss in optimality in basing our guess on T. We shall next 
consider other design criteria and show that, for these too, there is no loss in 
optimality in basing our guess on T. 


We first elaborate on what a message error is. If we denote our guess by 

e + = \T 

d= (di,...,dx) ‘ 
then a message error occurs if our guess differs from the message d in at least one 
component, i.e., if dg 4 dy for some @ € {1,...,n}. The probability of a message 


error is thus 7 
Pri D: Dj. (28.15) 


Designing the receiver to minimize the probability of a message error is reasonable, 
for example, when the k data bits constitute a computer file, and we wish to 
minimize the probability that the file is corrupted. In such applications the user is 
often only interested in knowing whether the file was successfully received (no error 
occurred) or if the file was corrupted (at least one error occurred). Minimizing the 
probability of a message error corresponds to minimizing the probability that the 
file is corrupted. 


In other applications, engineers are more interested in the average probability 
of a bit error or bit error rate (BER). That is, they may wish to minimize 


k 
: S Pr[D; # Dj]. (28.16) 
j=l 


To better appreciate the difference between the average probability of a bit error 
(28.16) and the probability of a message error (28.15), define the RV 


E;={D;#D;}, je {1,...,k}, 


which indicates whether the j-th bit was incorrectly decoded. Minimizing the 
probability of a message error minimizes 


k 
Pr bs Ej > 5 
j=l 
whereas minimizing the average probability of a bit error minimizes 
1 k 
; el Bi]. (28.17) 
j=l 


Thus, minimizing the probability of a message error is equivalent to minimizing the 
probability that one or more of the data bits is corrupted, whereas minimizing the 


638 Detecting PAM and QAM Signals in White Gaussian Noise 


average probability of a bit error is equivalent to minimizing the expected number 
of data bits that are decoded erroneously. 


We next argue that there is no loss in optimality in basing our guess on T also 
when designing to minimize the average probability of a bit error (28.16). We first 
note that to minimize (28.16) we should choose for each j € {1,...,k} our guess D; 
to minimize 


Pr [D; x Dj] . 


That is, we should consider the binary hypothesis testing problem of guessing 
whether D; is equal to zero or one, and we should guess D; to minimize the 
probability of error associated with this problem. To conclude our argument we 
next show that for the purpose of minimizing Pr[D; # D,|, there is no loss in 
optimality in basing our decision on T. To show this, it suffices, by the binary 
version of Theorem 26.3.2, to establish that T also forms a sufficient statistic for 
guessing D; based on (Y(t)). That is, we need to show that for every 7 € N and 
any choice of the epochs ¢1,...,t, € R, the vector T forms a sufficient statistic 
for guessing D; based on (Y(t1),...,¥(ty),T). This follows from the sufficiency 
of T for guessing D based on (Y (t1), nee Y(t), T) and from Proposition 22.4.4, 
which shows that the sufficiency of T for guessing D also implies its sufficiency for 
guessing whether D is in the set of k-tuples whose j-th component is zero or in its 
complement set of k-tuples whose j-th component is one. 


More generally we have: 


Proposition 28.3.1. Consider the setup of Proposition 28.2.1. Let w: d > y(d) 
be any function of the data bits, and let D have an arbitrary prior. Then no 
guessing rule for guessing w(D) based on (Y (t)) can outperform an optimal rule 
for guessing w(D) based on T,...,T7™. 


Proof. Any function from {0,1}* can take on at most 2* different values. Let 
denote the number of different values that (-) takes, i-e., 


g= #{(d) de {0,1}*}, 


where #.A denotes the number of elements in the set A. Denote these different 
values by 71,...,Yq- The q subsets of D 


{de {0,1}*: v(d)= 7%}, Ke f{l,...,g} 


are disjoint sets whose union is {0,1}*. That is, they form a partition of {0,1}*. 
Guessing ~(D) is equivalent to guessing which subset in this partition contains D. 
For this we know that (T™,...,7™) forms a sufficient statistic because it forms 
a sufficient statistic for guessing D and hence, by Note 22.4.5, it also forms a 
sufficient statistic for guessing which subset in the partition contains D. The 
result now follows from Theorem 26.3.2. 


The examples we have seen so far correspond to the case where yw: d +> d (with the 
probability of guessing ~(D) incorrectly corresponding to a message error) and the 
case w: d+> d; (with the probability of guessing ~(D) incorrectly corresponding 


28.4 Consequences of Orthonormality 639 


D,,D2, +++ ,Dx, Dx4i, ++: , Dox, »Dp—-K4i, ++: De 
acl jan | enc(-) 
X1, Xe, ok XN, XN41, ar, , Xan, ,Xn-N41; Re Xn 
enc(Di,..., Dx) enc(Dx41,..., Dek) enc(Dr—K+1,---; Dk) 


Figure 28.1: Block-mode encoding. 


to the probability that the j-th bit D; is incorrectly decoded). Another useful 
example is when w: d > (Bizs ya ,d,) for some given v,v’ € N satisfying v’ > v. 
This situation corresponds to the case where (D,,...,D,-) constitutes a packet 
and we are interested in the probability that the packet is erroneously decoded. 

Yet another example arises in block-mode transmission—which is described in Sec- 
tion 10.4 and which is depicted in Figure 28.1—where the data bits D,,..., Dz are 
mapped to the symbols Xj,...,X, using a (K,N) binary-to-reals block encoder 


enc: {0,1} = RN. 


Here we assume that k is divisible by K and that n = Nk/K. 

If we wish to guess the K-tuple (De-n K-41 re .»Dw-1)K+k) with the smallest 
probability of error, then there is no loss in optimality in basing our guess on 
TY,...,7™. This follows by applying Proposition 28.3.1 with the function 
#(d) = (dw—1)K41,--- v—1)k 4k): 
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The conditional distribution of the inner products in (28.11) becomes simpler when 
the time shifts of the pulse shape by integer multiples of T; are orthonormal. In 
this case we denote the pulse shape by ¢(-) and state the orthonormality condition 
as 


/ o(t — €T;) o(t — @Ts) dt =H = 0}, 2 EZ, (28.18) 
or, equivalently, as 
1 if 2=0 
Roo (Ts) = " ¢EZ. 28.19 
o(FTs) {) if 040. (28.19) 


28.4.1 The Conditional Law of the Sufficient Statistic 


From Proposition 28.2.1 we obtain a key result on PAM communication in white 
Gaussian noise: 
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Corollary 28.4.1. Consider PAM where data bits D,,...,D, are mapped by an 


encoder to the real symbols X,...,Xn, which are then mapped to the waveform 
X(t)=AS— Xp d(t— C1), tER, (28.20) 
f=1 


where the pulse shape ¢(-) is an integrable signal that is bandlimited to W Hz and 
whose time shifts by integer multiples of the baud period T, are orthonormal. Let 
the observed waveform (Y(t)) be given by 


Y(t)=X(t)+N(t), tER, 


where (N(2), te R) is independent of the data bits and is white Gaussian noise of 
PSD No/2 with respect to the bandwidth W. 


(i) The n inner products 
To =} Viet -—Lhlde £eEt non} (28.21) 


form a sufficient statistic for guessing (D,,...,D,) based on (Y(t)). 


(ti) Conditional on D = d with corresponding encoder outputs (X1,...,Xn) = 
(@1,.--,;%n), the inner products (28.21) are independent with 


TO ~w(Arn St), PTT might (28.22) 


(iti) The conditional distribution of these inner products can also be expressed as 


TO =Ag,t+Z, le {i,...,n}, (28.23a) 
where . 
Give Dob n(o. *) (28.23b) 
From Proposition 28.3.1 we obtain that T™,...,7™ also form a sufficient statistic 
for guessing the value of any function of the data bits D,,..., Dr. 


28.4.2 A Further Reduction in the Sufficient Statistic 


We next show a further reduction (from n to N random variables) of the suffi- 
cient statistic in block-mode transmission (with the pulse shape ¢(.-) still satisfying 
(28.19)). For this reduction to hold we need to assume that the data bits are 
independent or that the &/K tuples 


(Di aon Dey (Dicnisns cy Dai )\yecey (Dpceiyedig a) (28.24) 


are independent. 
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Proposition 28.4.2. In addition to the assumptions of Corollary 28.4.1, assume 
that X1,...,Xn are generated from D,,..., Dx in block-mode using a (K,N) binary- 
to-reals block encoder. Further assume that the K-tuples in (28.24) are independent. 
Then for every v € {1,...,k/K}, the N-tuple 


a), oe ae) (28.25) 
forms a sufficient statistic for guessing the K-tuple 


(Deki. <2 ed) (28.26) 


or any function thereof. 


Proof. Fix some v € {1,...,k/K}. For every choice of 7 € N and of the epochs 
t1,...,t) € R, the n-tuple of matched filter outputs (TY,...,T7™) forms a suf- 
ficient statistic for guessing D,,...,D, based on (Y (t1), Eh Ye (bn) T) (Proposi- 
tion 28.2.1). Consequently, by Note 22.4.5, this n-tuple is also sufficient for guessing 
the K-tuple (28.26). We shall next show that the N-tuple (28.25) is sufficient for 
guessing the K-tuple (28.26) based on the n-tuple (T,..., 7”). It will then fol- 
low from Proposition 22.4.3 that the N-tuple (28.25) is also sufficient for guessing 
the K-tuple (28.26) based on (Y(t1),...,¥(t»),T), thus establishing the proposi- 
tion. 

That the N-tuple (28.25) is sufficient for guessing the K-tuple (28.26) based on the 
n-tuple (T,...,T') is equivalent to the irrelevancy of 


R4 (Co) ee lc me (aS) 


T 


DON) ha (OE) 2), 


for guessing the K-tuple (28.26) based on the N-tuple (28.25). To prove this irrele- 
vancy, it suffices to prove two claims: that R is independent of the K-tuple (28.26) 
and that, conditionally on this K-tuple, R is independent of the N-tuple (28.25) 
(Proposition 22.5.5). These claims follow from three observations: that, by the 
orthonormaility assumption (28.19), R is determined by the data bits 


Di pcx tel otis Ditties. yD (28.27) 
and by the random variables 
Dig oi s4 Dey Ayn Zon abewig a (28.28) 


that the N-tuple (28.25) is determined by the K-tuple (28.26) and by the random 
variables 

Z(v—1)N415+++> ZN} (28.29) 
and that the tuples in (28.26), (28.27), (28.28), and (28.29) are independent. 
Having established that the N-tuple (28.25) forms a sufficient statistic for guessing 
the K-tuple (28.26), it now follows, using arguments very similar to those employed 


in proving Proposition 28.3.1, that the N-tuple (28.25) is also sufficient for guessing 
the value of any function ¢)(-) of the K-tuple (28.26). 
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28.4.3. The Discrete-Time Single-Block Model 


Proposition 28.4.2 is the starting point of much of the literature on block codes, 
upon which we shall touch in Chapter 29. In Coding Theory N is usually called the 
blocklength, and K/N is called the rate in bits per dimension. Coding theorists 
envision that the function enc(-) is used to map k bits to n real numbers using the 
block-encoding rule of Figure 10.1 (with & being divisible by K) and that the result- 
ing real symbols are then transmitted over a white Gaussian noise channel using 
PAM with a pulse shape satisfying the orthogonality condition (28.19). Assuming 
that the data tuples are independent, and by then resorting to Proposition 28.4.2, 
coding theorists focus on the problem of decoding the K-tuple (28.26) from the N 
matched filter outputs (28.25). 


In this problem the index v of the block is immaterial, and coding theorists re- 
label the data bits of the K tuple (28.26) as D,,..., Dx; they re-label the symbols 
to which they are mapped as Xj,...,XN; and they re-label the corresponding 
observations as Y,,...,YnN. The resulting model is the discrete-time single- 
block model where 


(X1,..., Xn) =enc(Dj,...,Dx), (28.30a) 

VS AGS. pe (oN (28.30b) 
No 

Zy~N (0,2), 7 {l.--sN}, (28.30c) 


where Z,...,ZN are IID and independent of D,,...,Dx. We recall that this 
model is appropriate when the pulse shape @ satisfies the orthonormality condi- 
tion (28.18); the data bits are “block IID” in the sense that the k/K tuples in 
(28.24) are independent; and the additive noise is white Gaussian noise of double- 
sided spectral density No/2 with respect to the bandwidth occupied by the pulse 
shape @. It is customary to additionally assume that D,,...,Dx are IID random 
bits (Definition 14.5.1). This is a good assumption if, prior to transmission, the 
data bits are compressed using an efficient data compression algorithm. 


28.5 Extension to QAM Communications 


28.5.1 Introduction and Setup 


We next extend our discussion to the detection of QAM signals. We assume that 
an encoding function 


yp: {0,1}* = Cc” 


is used to map the k data bits D = (Dj,...,Dxz)' to the n complex symbols 
C = (Ci,...,Cy)' and that the resulting complex symbols are then mapped to 
the passband signal (Xpp (t)), which is given by 


Xpp(t) _ 2 Re(Xpp(t) enh ist): teER, 
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where 
Xpa(t =AS2Cralt — Mh), t eR; 
t=1 
the pulse shape g(-) is a complex integrable signal that is bandlimited to W/2 Hz; 
A > 0 isa real constant; and f. > W/2. Conditionally on D = d, we denote the 
transmitted signal by 


a(t;d) = ) =2ARe( > ora (t — €T;) ea) (28.31) 
é=1 
gr,e(t) 
OO PC OND 
=V2A Re(c 2Re( 9 t — £T, ore) 
> (ce) Fa ( ) 
— 
gt,¢,BB(t) 
9Q,4(t) 
+ V2A 5“ Im(cr) 2Re (‘% g(t — &Ts) oP, teER, (28.32) 
ta v2 
9Q,¢,BB(t) 


where c = (qd) is the result of encoding the data bits d; where (28.32) follows from 


(16.7); and where {giv}, {gq.c}, {g1,2BB}, {8Q,,BB} are as indicated in (28.32) 
and as defined in (16.8) and (16.9). 


We consider the case where, conditional on D = d, the received waveform (Y (t)) 
is given by 

Y(t) =a(t;d)+ N(t), teER, (28.33) 
where (N(t)) is white Gaussian noise of PSD No/2 with respect to the band- 
width W around the carrier frequency f. (Definition 25.15.3). 
28.5.2 Real Sufficient Statistics 


The representation (28.32) makes it clear that for every d € {0,1}* the signal 
t ++ a(t; d) can be expressed as a linear combination of the 2n real-valued signals 


{g1ehea1,  {8qubeni- (28.34) 


Since these signals are integrable signals that are bandlimited to W Hz around the 
carrier frequency fe, it follows that the 2n inner products 


ay. Y(t) c(t) dt, €€ {1,...,n}, (28.35a) 


Se Y(t)9q.e(t)dt, €€ {1,...,n} (28.35b) 


form a real sufficient statistic for guessing D based on (Y(t)) (Section 26.10). 
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To describe the distribution of the sufficient statistic conditional on each of the 
hypotheses, we next express the inner products between the functions in (28.34) in 
terms of the self-similarity function Rgg of the complex pulse shape g 


Rig(t) = /. g(t +r) g*(t)dt, TER (28.36) 


(Definition 11.2.1). Key to these calculations is the relationship between the inner 
product between real passband signals and the inner product between their complex 
baseband representations (Theorem 7.6.10). Thus, 


(g1,7, 81) = 2Re((g1v,.BB, 81,¢,BB)) 
= Re((t > g(t — £'T,),t g(t — fTs))) 


=e fie g(t — UT.) g*(t — Ae) at) 


= Re(Re ((—2)T, ay 00 eZ, (28.372) 


where the first equality follows by relating the inner product in passband to the 
inner product in baseband; the second from the expressions for the corresponding 
baseband representations (16.9a); the third from the definition of the inner product 
for complex-valued signals (3.4); and the final equality from the definition of the 
self-similarity function (28.36). Similarly, 


(89,0, 8a,c) = 2 Re((ga,e,BB, 8aQ,¢,BB)) 
= Re({é + ig(t — 2T,),t + ig(t — £T5))) 


a ig(t — CTs) (—i) g*(t at.) at) 


= Re (t — eT, t — €T,) dt 
(fatter) eer a) 


=Re(Reg((C-)T)), LEZ, (28.37b) 


and 


(Sq. 81,.c) = 2Re((gaQ,e,BB, 81,¢,BB)) 
a Re((t + ig(t — Ts), tH g(t — éTs))) 


= Re ( ie g(t — Ts) g* (t — CTs) ar) 


=—Im oe g(t — Ts) g* (t — £13) ar) 


=—Im(Reg((-e')T.)), 0 €Z, (28.37c) 


where the first equality leading to (28.37c) follows from the relationship between 
the inner product between real passband signals and the inner product between 
their baseband representations (Theorem 7.6.10); the second from the expressions 
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for the corresponding baseband representations (16.9); the third from the definition 
of the inner product between complex signals (3.4); the fourth from the identity 
Re(iz) = —Im(z); and where the last equality follows from the definition of the 
self-similarity function of complex signals (28.36). 


We are now ready to compute the conditional law of the sufficient statistic given 
each of the hypotheses. Conditional on D = d with corresponding c = y(d), 
the 2n random variables as Tr) tess are jointly Gaussian (Section 26.10). Their 
conditional law is thus fully specified by the conditional mean vector and by the 
conditional covariance matrix. We begin with the computation of the former: 


ape [D =4 
= (th 2(d;t), 1,2) 


= (va S © Re(cer) gre + V2A $= Im(ce’) ger, Bu) 


CHI t=; 


= V2A S- (Rec) (g1,¢, Src) + Im(ce) (gow.810)) 


’=1 
= 2A x (Re(er) Re(Reg ((€ : eT.) —Im(ce) Tm (Reg ((¢ = n)) 
’=1 
= V2A > Re (cor Rgg ((é - £)Ts) ) 
’=1 
= V2A Re( se cv Reg ((é — yn) , (28.38a) 
seb 


where the first equality follows from the definition of 7 (28.35a), from (28.33), 
and from our assumption that the noise (V(t)) is of zero mean; the second from 
(28.32); the third from the linearity of the inner product; the fourth by express- 
ing the inner products using the self-similarity function, i.e., using (28.37a) and 
(28.37c); the fifth by the complex-numbers identity Re(wz) = Re(w) Re(z) — 
Im(w) Im(z); and the final equality because the sum of the real parts is the real 
part of the sum. Similarly, 


= (va S © Re(cer) gi + V2A $= Im(cer) gQ.er, go.) 


=) C= 1 


= V2A » (Re(c) (81,7 8a,e) + Im(ce’) (SQ.e, ga.) 
v=l 


=4/2A 5 (Rete (— tn (Res (e = ot) +Im(cy) Re(Reg ((¢ = on))) 


Aan 
n 


= V2ay- (Re(ew) Tm (Reg ((€ = T.)) + Im(cpr) Re (Reg ((¢ = eyn))) 


t=1 
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= VIA 7 Iin(ce Reg (€- £YT.)) 


eT 


= J2A im( S co Reg ((E — yn) . (28.38b) 


cal 


where the first equality follows from the definition of Tt (28.35b), from (28.33), 
and from our assumption that the noise (V(t)) is of zero mean; the second from 
(28.32); the third from the linearity of the inner product; the fourth by express- 
ing the inner products using the self-similarity function, i.e., using (28.37c) and 
(28.37b); the fifth by the conjugate symmetry of the self-similarity function (Propo- 
sition 11.2.2); the sixth by the complex-numbers identity Im(wz) = Re(w) Im(z) + 
Im(w) Re(z); and the final equality by noting that the sum of the imaginary parts 
is equal to the imaginary part of the sum. 


The conditional covariances are easily computed using Note 25.15.4. Using the 
inner products expressions (28.37), we obtain: 


if vt N 
Cov|Ti" , ve | D= d| = = (gi B1.e") 
ane Re (Reg ((€ = e"yT.)), (28.39a) 


, a N 
Cov es ) | D= d| — > (Bae Ba.e") 


‘o Re (Reg ((€ = ey), (28.39b) 


and 
Cov” pe) D = d| = Noy 
I °*Q | Sie’, SQ) 
=-—— Tn (Reg ((¢ — £") I.)). (28.39c) 


We summarize our results on QAM detection in white Gaussian noise as follows. 


Proposition 28.5.1 (QAM Detection in White Noise: Real Sufficient Statistics). 
Let a QAM signal (28.32) of an integrable pulse shape g(-) that is bandlimited 
to W/2 Hz be observed in white Gaussian noise of PSD No/2 with respect to the 
bandwidth W around the carrier frequency f.. Then: 


(i) The 2n inner products 
TO = / Y(t) Me(t)dt, Le {1,...,n}, (28.40a) 


7! = / Y(t) 9qe(t)dt, Ce {1,...,n} (28.40b) 
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form a sufficient statistic for guessing D based on (Y (2), where 


1 

gte(t) = 2Re( = g(t = £Ts) ann, teR, 
dg i2n f. 

youl) = 2Re( Tilt ~0T,)e ‘), teR. 


(ti) Conditional on D = d with corresponding transmitted symbols c = y(d), 
these 2n real random variables are jointly Gaussian with conditional means as 
specified by (28.38) and with conditional covariances as specified by (28.39). 


28.5.3. Complex Sufficient Statistics 


The notation is simpler if we introduce the n complex random variables 


Te) A TO zi ES 
=| Y(t) ott) ati [ Y(t)9qe(t)dt, €€{1,...,n}. (28.41) 


These n complex random variables form a sufficient statistic in the sense that their 
real and imaginary parts form a sufficient statistic. Using (28.38) we obtain 


Ero | D= d| 


=—[7° |p =a] +iE[79|D=4 


= 2A Re( > co Reg ((é — yt) +iV2A im( 3 cor Reg ((é — yn) ) 


e=1 =1 


= V2A 2 cy Reg ((€-#)Ts), £¢ {1,...,n}- (28.42) 


C=] 


The advantage of the complex notation is that—as we shall see in Proposition 28.5.2 
ahead—conditional on D = d, the random vector T — E[T|D = d] is proper (Defi- 
nition 17.4.1). And since conditionally on D = d it is also Gaussian, it follows from 
Proposition 24.3.11 that, conditional on D = d, the random vector T—E[T|D = d] 
is a circularly-symmetric complex Gaussian (Definition 24.3.2). Its conditional law 
is thus determined by its conditional covariance matrix (Corollary 24.3.8). This 
covariance matrix is an n x n (complex) matrix, whereas the covariance matrix for 
the 2n real variables in Proposition 28.5.1 is a (2n) x (2n) (real) matrix. 


We summarize our results for QAM detection with complex sufficient statistics in 
the following. 


Proposition 28.5.2 (QAM in White Noise: Complex Sufficient Statistics). Con- 
sider the setup of Proposition 28.5.1. 


i) The complex random vector T = (T,...,T™)" defined b 
y 


TO = vjnesatri f- Y(t) 9qe(t)dt, ¢€ {1,...,n}, 
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forms a sufficient statistic for guessing D based on (Y(t)). 
(ii) The €-th component of T can be expressed as 
TO = V2AS~ Cu Reg ((E— 2/)Ts) +2, LE {1,..., nf, 
= 
where Reg is the self-similarity function of the pulse shape g(-) (28.36), and 
where the random vector Z=(Z,...,Z'™)" is independent of D and is a 
circularly-symmetric complex Gaussian of covariance 
Cay [Z),2€) = E[z)(z@"))"] 
=No Rae (0 _ Ts), oe e€ {i,...,n}. (28.43) 
(iti) If the time shifts of the pulse shape by integer multiples of Ts are orthonormal, 


then 
T® = J2ACG,+ 2, Le {l,...,n,}, (28.44) 


where the complex random variables {Z} are independent of {Dj} and are 
IID circularly-symmetric complex Gaussians of variance No. 


Proof. Part (i) follows directly from Proposition 28.5.1 because, by definition, the 
sufficiency of T is equivalent to the sufficiency of its real and imaginary parts. 


To prove Part (ii) define 


ZOA TO _ V/A S- Oy Reg ((@— 2) Ts) 2S fh aeesnh, (28.45) 
=t 


and note that by (28.42) the conditional distribution of Z given D = d is of zero 
mean. Moreover, from Proposition 28.5.1 and from the definition of a complex 
Gaussian random vector as one whose real and imaginary parts are jointly Gaussian 
(Definition 24.3.6), it follows that, conditional on D = d, the vector Z is Gaussian. 
To prove that it is proper we compute 


EZ ze") D= d| 
= E[Re(Z) Re(Z@)) —Im(Z©?) Im(Z@) 


+i€[Re(Z) Im(Z) + Im(Z) Re(Z"”) | D = €] 


D=4] 


Coyle” | D =a] - Coy, TY” | D=4] 
+i (Com), 79” | D = 4] + Co, | D=d]) 
=0, 0,0" € {1,...,n}, 


where the second equality follows from (28.45) and the last equality from (28.39). 
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The calculation of the conditional covariance matrix is very similar except that Z e) 
is now conjugated: 


Cov| 2°), Ze) 


D=d| 
= E[Re(Z) Re(Z) + Im(Z?) Im(Z) | D =] 
+i€|-Re(Z) Im(Z) + Im(Z) Re(Z) |D =] 
= cof), 71| D=d] + cov] TY, 7) | D=d) 
4 (-cov| 1,78 D= d| £ Coury | D= d]) 


-~ Re (Reg ((€ ")T.)) 4 “e Re (Res ((@ = ")T.)) 


+i (¥ Tn (Reg ((€ e")T.)) = “ Tn (Reg ((e” fs mn) 
=NoReg((-2)Te), 0" € {1,...,n}, (28.46) 


where the first equality follows from the definition of the covariance between com- 
plex random variables (17.17); the second by (28.45); the third by (28.39); and the 
last equality by the conjugate-symmetry of the self-similarity function (Proposi- 
tion 11.2.2 (iii)). 

Conditional on D = d, the complex n-vector Z is thus a proper Gaussian, and its 
conditional law is thus fully specified by its conditional covariance matrix (Corol- 
lary 24.3.8). By (28.46), this conditional covariance matrix does not depend on d, 
and we thus conclude that the conditional law of Z conditional on D = d does not 
depend on d, i.e., that Z is independent of D. 


Part (iii) follows from Part (ii). 


28.6 Additional Reading 


Proposition 28.2.1 and Proposition 28.5.2 are the starting points of much of the 
literature on equalization and on the use of the Viterbi Algorithm for channels 
with inter-symbol interference (ISI). See, for example, (Proakis, 2000, Chapter 10), 
(Viterbi and Omura, 1979, Chapter 4, Section 4.9), and (Barry, Lee, and Messer- 
schmitt, 2004, Chapter 8). 


28.7 Exercises 


Exercise 28.1 (A Dispersive Channel). Let the transmitted signal (X(t)) be as in (28.1), 
and let the received signal (Y(t)) be given by 


Y(t) =(X«h)(@)+N(), teR, 


where (N(t)) is white Gaussian noise of PSD No/2 with respect to the bandwidth W, 
and where h is the impulse response of some stable real filter. 
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(i) Show that the n inner products 
i, Y(t) (gx h)(t— eT.) dt, €€ {1,...,n} 


form a sufficient statistic for guessing Di,...D, based on (Y(t)). 


(ii) Compute their conditional law. 


Exercise 28.2 (PAM in Colored Noise). Let the transmitted signal (X(t)) be as in (28.1), 
and let the received signal (Y(t)) be given by 


Y(t)=X(t)+ N(t), teER, 


where (N(t)) is a centered, stationary, measurable, Gaussian SP of PSD Sww that can be 
whitened with respect to the bandwidth W. Let h be the impulse response of a whitening 
filter for (N(t)) with respect to W. 


(i) Show that the n inner products 
i Y(t)(g*hxh)(t—Ts)dt, €€ {1,...,n} 


form a sufficient statistic for guessing D1,...D, based on (Y(t)). 


(ii) Compute their conditional law. 


Exercise 28.3 (A Channel with an Echo). Data bits Di,...,D, are mapped to real sym- 
bols X1,..., X% using the antipodal mapping, so X¢ = 1 — 2D, for every £ € {1,...,k}. 
The transmitted signal (X(t)) is given by X(t) = A 30, Xe $(t—£Ts), where ¢ is an inte- 
grable signal that is bandlimited to W Hz and that satisfies the orthonormality condition 
(28.18). The received signal (Y(t)) is 


Y(t) = X(t) +aX(t—Ts)+N(t), tER, 


where (N(t)) is white Gaussian noise of PSD No/2 with respect to the bandwidth W, 
and a is a real constant. Let Y~ be the time-@T; output of a filter that is matched to @ 
and that is fed (Y(t)). 


(i) Do N%,..., Yx41 form a sufficient statistic for guessing (Di,..., Dx)? 


(ii) Consider a suboptimal rule that guesses “D; = 0” if Y; > 0, and otherwise guesses 
“PD; = 1.” Express the probability that this rule guesses D,; incorrectly in terms 
of 7, a, A, and No. To what does this probability of error converge when No tends 
to zero? 


Exercise 28.4 (Another Channel with an Echo). Consider the setup of Exercise 28.3 but 
where the echo is delayed by a noninteger multiple of the baud period. Thus, 


Y(t) =X(t)+aX(t—7T)+N(t), teER, 
where 0 <7 < T,. Show that the 2k inner products 


J Y(t) o(t — lT.) dt, i Y(t) o(t—£T,—r)dt, CE {1,...,k} 


form a sufficient statistic for guessing (Di,..., Dx) based on (Y(t)). 


28.7 Exercises 651 


Exercise 28.5 (A Multiple-Access Scenario). Two transmitters communicate with a single 
receiver. The receiver observes the signal 


Y(t) = A1X1 ¢,(t) +A2X2 o2(t)+N(t), tER, 


where Ai,A2 > 0; d1 and @2 are orthonormal integrable signals that are bandlimited 
to W Hz; the pair (X1, X2) takes value in the set {(+1, +1), (+1, -1), (—1, +1), (—1,-1)} 
equiprobably; and where (N (t)) is white Gaussian noise of PSD No/2 with respect to 
the bandwidth W. 


(i) Can you recover (Xi, X2) from Ai Xi1g1 + A2X2¢g2? 
(ii) Find an optimal receiver for guessing (X1,X2) based on (Y(t). 


) 
) 
) 
) 


(iii) Compute the optimal probability of error for guessing (X,, X2) based on (Y(t)). 
(iv) Suppose that a genie informs the receiver of the value of X2. How should the 
receiver then guess X; based on (Y (t)) and the information provided by the genie? 


(v) A receiver guesses “X, = +1” if (Y,d1) > 0 and guesses “X,; = —1” otherwise. Is 
this receiver optimal for guessing X1? 


Exercise 28.6 (Two Receiver Antennas). Consider the setup of (28.1). We observe two 
signals (Yi(t)), (Yo(t)) that are given at every epoch t € R by 


Yi(t) = (X*hi)(t)+ M(t), Yo(t) = (KX * he) (t) + No(t), 


where h; and he are the impulse responses of two real stable filters, and where the 
stochastic processes (Ni(t)) and (N2(t)) are independent white Gaussian noise processes 
of PSD No/2 with respect to the bandwidth W. 


(i) Extend Definition 26.3.1 to the case where the observation consists of two stochastic 
processes. 


(ii) Show that the 2n inner products 


/ Yi(t) (g * hi) (t — £Ts) dé, if Y2(t) (gx he)(t— Ts) dt, @€ {1,...,n} 
form a sufficient statistic for guessing Di,... Dg based on (Yi(t)) and (Y2(t)). 


Exercise 28.7 (Bits of Unequal Importance). Consider the setup of Section 28.3 but 
where some data bits are more important than others. We therefore wish to minimize the 
weighted average 


k 
>) oy Pr[D; # D3], 
j=l 
for some positive a1,...,a,% that sum to one. 
(i) Is it still optimal to base our guess of D;,..., Dx on the inner products in (28.11)? 


(ii) Does this criterion lead to a different receiver design than the bit error rate? 


Exercise 28.8 (Sandwiching the Probability of a Message Error). In the notation of 
Section 28.3, show that 


ale 


: k 
Do PrLDs # Dsl < may {PrlDy # Dal} < Pr{D AD] < Yo Pr{Ds # Dy} 


j=1 
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Exercise 28.9 (Sandwiching the Bit Error Rate). In the notation of Section 28.3, show 
that 


k 
+ Pr[D D] < + oP, # Dj] < Pr[D 4D]. 


j=1 


Exercise 28.10 (Transmission via an Unknown Dispersive Channel). A random switch 
that is outside our control and whose realization is not observed determines whether the 
observed output (Y(t)) is given by 


X*xhi+N or X*ho+N, 


where (X(t)) is the transmitted signal of (28.1); (N(¢)) is white Gaussian noise of 
PSD No/2 with respect to the bandwidth W; and h; & he are the impulse responses of 
two stable real filters. Show that the 2n inner products 


i. Y(t) (g * hi) (t — £Ts) dt, is Y(t) (gx he)(t—£Ts) dt, €€ {1,...,n} 


form a sufficient statistic for guessing Di,... Dx based on (Y(t)). 


Chapter 29 


Linear Binary Block Codes with Antipodal 
Signaling 


29.1 Introduction and Setup 


We have thus far said very little about the design of good encoders. We men- 
tioned block encoders but, apart from defining and studying some of their basic 
properties (such as rate and energy per symbol), we have said very little about 
how to design such encoders. The design of block encoders falls under the heading 
of “Coding Theory” and is the subject of numerous books such as (MacWilliams 
and Sloane, 1977), (van Lint, 1998), (Blahut, 2002), (Roth, 2006) and (Richard- 
son and Urbanke, 2008). Here we provide only a glimpse of this theory for one 
class of such encoders: the class of binary linear block encoders with antipodal 
pulse amplitude modulation. Such encoders map the data bits D,,...,Dx to the 
real symbols Xj,...,XN by first applying a one-to-one linear mapping of binary 
K-tuples to binary N-tuples and by then applying the antipodal mapping 


Or +1 
lr -l 


to each component of the binary N-tuple to produce the {+1}-valued symbols 
X1,.-.,XN.- 

Our emphasis in this chapter is not on the design of such encoders, but on how 
their properties influence the performance of communication systems that employ 
them in combination with Pulse Amplitude Modulation. We thus assume that the 
transmitted waveform is given by 


AY) Xe¢(t-£1.), tER, (29.1) 
L 


where A > 0 is a scaling factor, T,; > 0 is the baud period, ¢(-) is a real integrable 
signal that is bandlimited to W Hz, and where the time shifts of ¢(-) by integer 
multiples of T; are orthonormal 


i b(t — €T,) o(t —@T,) dt =1H{l= 0}, 0,0 €Z. (29.2) 
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The summation in (29.1) can be finite, as in the block-mode that we discussed 
in Section 10.4, or infinite, as in the bi-infinite block-mode that we discussed in 
Section 14.5.2. We shall further assume that the PAM signal is transmitted over 
an additive noise channel where the transmitted signal is corrupted by Gaussian 
noise that is white with respect to the bandwidth W. We also assume that the 
data are IID random bits (Definition 14.5.1). 


In Section 29.2 we briefly discuss the binary field Fz and discuss some of the basic 
properties of the set of all binary «-tuples when it is viewed as a vector space over 
this field. This allows us in Section 29.3 to define linear binary encoders and codes. 
Section 29.4 introduces binary encoders with antipodal signaling, and Section 29.5 
discusses the power and power spectral density when they are employed in conjunc- 
tion with PAM. Section 29.6 begins the study of decoding with a discussion of two 
performance criteria: the probability of a block error (also called message error) 
and the probability of a bit error. It also recalls the discrete-time single-block 
channel model. Section 29.7 contains the design and performance analysis of the 
guessing rule that minimizes the probability of a block error, and Section 29.8 con- 
tains a similar analysis for the guessing rule that minimizes the probability of a bit 
error. Section 29.9 explains why performance analysis and simulation is often done 
under the assumption that the transmitted data is the all-zero data. Section 29.10 
discusses how the encoder and the PAM parameters influence the overall system 
performance. The chapter concludes with a discussion of the (suboptimal) Hard 
Decision decoding rule in Section 29.11 and of bounds on the minimum distance 
of a code in Section 29.12. 


29.2 The Binary Field F, and the Vector Space F* 


29.2.1 The Binary Field F2 


The binary field F2 consists of two elements that we denote by 0 and 1. An 
operation that we denote by © is defined between any two elements of F2 through 
the relation 

0e@0=0, 0G1=1, 160=1, 161=0. (29.3) 


This operation is sometimes called “mod 2 addition” or “exclusive-or” or “GF(2) 
addition.” (Here GF(2) stands for the Galois Field of two elements after the French 
mathematician Evariste Galois (1811-1832) who did ground-breaking work on finite 
fields and groups.) Another operation—“GF(2) multiplication” —is denoted by a 
dot and is defined via the relation 


0-0=0, 0-1=0, 1-0=0, 1-1=1. (29.4) 


Combined with these operations, the set F2 forms a field, which is sometimes 
called the Galois Field of size two. We leave it to the reader to verify that the @ 
operation satisfies 
a@gb=b@a, a,be€ Fo, 
(a@b)@c=aG(bGc), a,b,cE Fo, 


ag0=0@a=a, ac€Fy, 
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a@ga=0, aé Fo; 
and that the operations @ and - satisfy the distributive law 
(a@®b)-c=(a-c) @(b-c), a,b,ce Fy. 
29.2.2 The Vector Field F5 


We denote the set of all binary «-tuples by F and define the componentwise-@ 


operation between «-tuples u = (wi,...,u,) € F§ and v = (v1,...,v%) € F§ as 
u@v = (ui OrI,..., Un BU), u,v € FS. (29.5) 
We define the product between a scalar a € Fz anda «-tuple u = (ui, i ite) € FS 
by 
aus (a-uy,...,0+ Ug). (29.6) 


With these operations the set F5 forms a vector space over the field F2. The all-zero 
k-tuple is denoted by 0. 


29.2.3 Linear Mappings 
A mapping T: F§ — F% is said to be linear if 
T(a-u@B-v) =a-T(u) @B6- Tv), (0,8 € Fy, uve FS). (29.7) 
The kernel of a linear mapping T: F§ — F3 is denoted by Ker(T) and is the set 
of all «-tuples in Ff that are mapped by T(-) to the all-zero 7-tuple 0: 
Ker(T) = {ue F¥: T(u) = 0}. (29.8) 
The kernel of every linear mapping contains the all-zero tuple 0. 
The image of T: FS — F3 is denoted by Image(T) and consists of those elements 
of Fi to which some element of F§ is mapped by T(-): 
Image(T) = {T(u): ue FS}. (29.9) 
The key results from Linear Algebra that we need are summarized in the following 
proposition. 
Proposition 29.2.1. Let T: FS — FY be linear. 


(i) The kernel of T(-) is a linear subspace of FS. 
(ti) The mapping T(-) is one-to-one if, and only if, Ker(T) = {O}. 


(iii) The image of T(-) is a linear subspace of F3. 


(iv) The sum of the dimension of the kernel and the dimension of the image space 
is equal to the dimension of the domain: 
Dim(Ker(T)) + Dim(Image(T)) = x. (29.10) 


(v) If U is a linear subspace of F3 of dimension k, then there exists a one-to-one 
linear mapping from F§ to F3 whose image is U. 
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29.2.4 Hamming Distance and Hamming Weight 


The Hamming distance dy(u, v) between two binary «-tuples u and v is defined 
as the number of components in which they differ. For example, the Hamming 
distance between the tuples (1,0,1,0) and (0,0,1,1) is two. It is easy to prove 
that for u,v,w € FS: 


dy(u,v) > 0 with equality if, and only if, u = v; (29.11a) 
dy(u, v) = du(v, u); (29.11b) 
dy(u,w) < dy(u, v) + du(v, w). (29.11c) 


The Hamming weight wy(u) of a binary «-tuple u is defined as the number of 
its nonzero components. Thus, 


wy(u) =dy(u,0), ue FS, (29.12) 
and 
dy(u,v) =wy(u@v), u,v € Fo. (29.13) 
29.2.5 The Componentwise Antipodal Mapping 


The antipodal mapping Y: Fy — {—1,+1} maps the zero element of Fp to the 
real number +1 and the unit element of F2 to —1: 


Y(0)=+4+1, Y(1)=-1. (29.14) 
This rule is not as arbitrary as it may seem. Although one might be somewhat 
surprised that we do not map 1 € Fz to +1, we have our reasons. We prefer the 
mapping (29.14) because it maps mod-2 sums to real products. Thus, 


T(a@b) =T(a)T(b), a,be Fo, (29.15) 


where the operation on the RHS between Y(a) and T (0) is the regular real-numbers 
multiplication. This extends by induction to any finite number of elements of Fo: 


T(r @a2@---@a)=][ Ve), cy..., € Fo. (29.16) 

é=1 
The componentwise antipodal mapping Y,,: FJ — {—1,+1}" maps elements 
of FZ to elements of {—1, +1}” by applying the mapping (29.14) to each component: 
Mee (eisne eey) (Tos L(G) (29.17) 


For example, Y3 maps the triplet (0,0,1) to (¢+1,+1,—1). 
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29.2.6 Hamming Distance and Euclidean Distance 
We next relate the Hamming distance dy(u,v) between any two binary 7-tuples 
u = (w1,...,U,) and v = (v1,...,U,) to the squared Euclidean distance between 
the results of applying the componentwise antipodal mapping Y, to them. We 
argue that 

dz(Y,,(u), Y,(v)) =4dy(u,v), u,v € FY, (29.18) 


where dg(-,-) denotes the Euclidean distance, so 


d3(T,(u), T,(v)) = S\(L(w,) — Y(w,))’. (29.19) 


To prove this relationship it suffices to consider the case where 7 = 1, because the 
Hamming distance is the sum of the Hamming distances between the respective 
components, and likewise for the squared Euclidean distance. To prove this result 
for 7 = 1 we note that if the Hamming distance is zero, then u and v are identical 
and hence so are T(u) and Y(v), so the Euclidean distance between them must be 
zero. And if the Hamming distance is one, then u 4 v, and hence Y(u) and Y(v) 
are of opposite sign but of equal unit magnitude, so the squared Euclidean distance 
between them is four. 


29.3. Binary Linear Encoders and Codes 


Definition 29.3.1 (Linear (K,N) F2 Encoder and Code). Let N and K be positive 


integers. 


(i) A linear (K,N) Fz encoder is a one-to-one linear mapping from FX to FY. 


(ii) A linear (K,N) Fe code is a linear subspace of F’ of dimension K.1 
In both definitions N is called the blocklength and K is called the dimension. 


For example, the (K,K + 1) systematic single parity check encoder is the 
mapping 
(di,. on , dx) nd (di, ...,dx,d, 8d2@---@ dx). (29.20) 


It appends to the data tuple a single bit that is chosen so that the resulting (K+ 1)- 
tuple be of even Hamming weight. The (K, K+1) single parity check code is the 
subset of FS * consisting of those binary (K + 1)-tuples whose Hamming weight is 
even. 


Recall that the image of a mapping g: A — B is the subset of 6 comprising those 
elements y € B to which there corresponds some x € A such that g(x) = y. 


lThe terminology here is not standard. In the Coding Theory literature a linear (K,N) Fo 
code is often called a “binary linear [N, K] code.” 
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Proposition 29.3.2 (Fz: Encoders and Codes). 


(i) If T: FE — F® is a linear (K,N) Fo encoder, then its image is a linear 
(K,N) Fo code. 


(ii) Every linear (K,N) Fo code is the image of some (nonunique) linear (K,N) 
F 2 encoder. 


Proof. We begin with Part (i). Let T: Ff — F) be a linear (K,N) Fo encoder. 
That its image is a linear subspace of FY follows from Proposition 29.2.1 (iii). 
That its dimension must be K follows from Proposition 29.2.1 (iv) (see (29.10)) 
because the fact that T(-) is one-to-one implies, by Proposition 29.2.1 (ii), that 
Ker(T) = {0} so Dim(Ker(T)) = 0. 

To prove Part (ii) we note that F‘ is of dimension K and that, by definition, every 
linear (K,N) Fe code is also of dimension K. The result now follows by noting 
that there exists a one-to-one linear mapping between any two subspaces of equal 
dimensions over the same field (Proposition 29.2.1 (v)). 


Any linear transformation from a finite-dimensional space to a finite-dimensional 
space can be represented as matrix multiplication. A linear (K,N) F2 encoder is 
no exception. What is perhaps unusual is that coding theorists use row vectors 
to denote the data K-tuples and the N-tuples to which they are mapped. They 
consequently use matrix multiplication from the left. This tradition is so ingrained 
that we shall begrudgingly adopt it. 


Definition 29.3.3 (Matrix Representation of an Encoder). We say that the linear 
(K,N) Fy encoder T: FS + FX’ is represented by the matriz G if G is aK xN 
matrix whose elements are in Fo and 


T(d)=dG, deF%. (29.21) 


Note that in the matrix multiplication in (29.21) we use F2 arithmetic, so the 7-th 
component of dG is given by d\) g(t) @.--@d"*) .g(&), where g(*”) is the Row-k 
Column-7 component of the matrix G, and where d““) is the «-th component of d. 
For example, the (K,K + 1) Fz systematic single parity check encoder (29.20) is 
represented by the K x (K + 1) matrix 


1 0 0 0 1 
0 1 0 0 1 
0 0 1 0 1 (29.22) 
a gar *se 0 1 
0 0 0 1 1 


The matrix G in (29.21) is uniquely specified by the linear transformation T(-): 
its 7-th row is the result of applying T(-) to the K-tuple (0,...,0,1,0,...,0) (the 
K-tuple whose components are all zero except for the 7-th, which is one). 

Moreover, every K x N binary matrix G defines a linear transformation T(-) via 


(29.21), but this linear transformation need not be one-to-one. It is one-to-one if, 
and only if, the subspace of F’ spanned by the rows of G is of dimension K. 
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Definition 29.3.4 (Generator Matrix). A matrix G is a generator matriz for a 
given linear (K,N) Fa code if G is a binary K x N matrix such that the image of 
the mapping d+ dG is the given code. 


Note that there may be numerous generator matrices for a given code. For example, 
the matrix (29.22) is a generator matrix for the single parity check code. But there 
are others. Indeed, replacing any row of the above matrix by the sum of that row 
and another different row results in another generator matrix for this code. 


Coding theorists like to distinguish between a code property and an encoder 
property. Code properties are properties that are common to all encoders of the 
same image. Encoder properties are specific to an encoder. Examples of code 
properties are the blocklength and dimension. We shall soon encounter more. An 
example of an encoder property is the property of being systematic: 


Definition 29.3.5 (Systematic Encoder). A linear (K,N) Fo encoder T: FS > FX 
is said to be systematic (or strictly systematic) if, for every K-tuple (di, fist ,dx) 
in FX, the first K components of T((di, ion ,dx)) are equal to dy,...,dx. 


For example, the encoder (29.20) is systematic. An encoder whose image is the 
single-parity check code and which is not systematic is the encoder 


(di, nas , dx) nd (di, dy ® do, dz @ d3,...,dx_1 ® dx, dx). (29.23) 


The reader is encouraged to verify that if a linear (K,N) Fp encoder T: FS — FD 
is represented by the matrix G, then T(-) is systematic if, and only if, the K x K 
matrix that results from deleting the last N — K columns of G is the K x K identity 
matrix. 


Definition 29.3.6 (Parity-Check Matrix). A parity-check matrix for a given 
linear (K,N) Fo code is a K x N matrix H such that a (row) N-tuple ¢ is in the 
code if, and only if, cH" is the all-zero (row) vector. 


For example, a parity-check matrix for the (K,K + 1) single-parity check code is 
the 1 x (K+ 1) matrix 
A Aes): 


(Codes typically have numerous different parity-check matrices, but the single- 
parity check code is an exception.) 


29.4 Binary Encoders with Antipodal Signaling 
Definition 29.4.1. 


(i) We say that a (K,N) binary-to-reals block encoder enc: {0,1}* — RN is a 
linear binary (K,N) block encoder with antipodal signaling if 


enc(d) = Yn(T(d)), de F%, (29.24) 
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where T: F¥ — FN is a linear (K,N) Fo encoder, and where Tn(-) is the 
componentwise antipodal mapping (29.17). Thus, if (X1,...,XN) denotes 
the N-tuple produced by enc(-) when fed the data K-tuple (D,,..., Dx), then 


. (29.25) 
—1 otherwise. 


xy = fe if the n-th components of T((Di, fe , Dx)) is zero, 
— 
(ii) A linear binary (K,N) block code with antipodal signaling is the image 
of some linear binary (K,N) block encoder with antipodal signaling. 


In analogy to Proposition 29.3.2, the image of every linear binary (K,N) block 
encoder with antipodal signaling is a linear binary (K, N) block code with antipodal 
signaling. 

If enc(-) can be represented by the application of T(-) to the data K-tuple followed 
by the application of the componentwise antipodal mapping Yn, then we shall 
write 


enc = YnoT. (29.26) 
Since Yn is invertible, there is a one-to-one correspondence between T and enc. 


An important code property is the distribution of the result of applying an encoder 
to IID random bits. 


Proposition 29.4.2. Let T: Ff — FX’ be a linear (K,N) F2 encoder. 


(i) Applying T to a K-tuple of IID random bits results in a random N-tuple that 
is uniformly distributed over Image(T). 


(it) Applying Yn oT to IID random bits produces an N-tuple that is uniformly 
distributed over the image of Image(T) under the componentwise antipodal 
mapping Yn. 


Proof. Part (i) follows from the fact that the mapping T(-) is one-to-one. Part (ii) 
follows from Part (i) and from the fact that TN(-) is one-to-one. 


For example, it follows from Proposition 29.4.2 (ii) and from (29.16) that if we 
feed IID random bits to any encoder (be it systematic or not) whose image is the 
(K, K + 1) single parity check code and then employ the componentwise antipodal 
mapping Yn(-), then the resulting random (K + 1)-tuple (Xj,...,Xx+1) will be 
uniformly distributed over the set 


K+1 
{ (€1s-+-s6ees) € fay phe! : II En = +1}. 


n=1 


Corollary 29.4.3. Any property that is determined by the joint distribution of the 
result of applying the encoder to IID random bits is a code property. 


Examples of such properties are the power and operational power spectral density, 
which are discussed next. 
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29.5 Power and Operational Power Spectral Density 


To discuss the transmitted power and the operational power spectral density we 
shall consider bi-infinite block encoding (Section 14.5.2). We shall then use the 
results of Section 14.5.2 and Section 15.4.3 to compute the power and operational 
PSD of the transmitted signal in this mode. 


The impatient reader who is only interested in the transmitted power for pulse 
shapes satisfying the orthogonality condition (29.2) can apply the results of Sec- 
tion 14.5.3 directly to obtain that, subject to the decay condition (14.46), the 
transmitted power P is given by 


P=_. (29.27) 


We next extend the discussion to general pulse shapes and to the operational PSD. 
To remind the reader that we no longer assume the orthogonality condition (29.2), 
we shall now denote the pulse shape by g(-) and assume that it is bandlimited 
to W Hz and that it satisfies the decay condition (14.17). Before proceeding with 
the analysis of the power and PSD, we wish to characterize linear binary (K, N) 
block encoders with antipodal signaling that map IID random bits to zero-mean 
N-tuples. Note that by Corollary 29.4.3 this is, in fact, a code property. Thus, if 
enc = Yy oT, then the question of whether enc(-) maps IID random bits to zero- 
mean N-tuples depends only on the image of T. Aiding us in this characterization 
is the following lemma on linear functionals. A linear functional on F§ is a linear 
mapping from F5 to Fg. The zero functional maps every «-tuple in F5 to zero. 


Lemma 29.5.1. Let L: FS — Fy. be a linear functional that is not the zero func- 
tional. Then the RV X defined by 


= Fl if L((Di,...,Dx)) =9, 
ed ay. L((Di,...,Dx)) =] 


(29.28) 


is of zero mean whenever D,,...,D are IID random bits. 


Proof. We begin by expressing the expectation of X as 


E[X]= 5— Pr[D = dj Y(L(d)) 


ders 
=2* S~ Yr(L(d)) 
ders 
=2* S$ ty+2*% SO (-1) 
deFS:L(d)=0 deFk:L(d)=1 


= 2-*(#L-1(0) - #L-")), 


where 
L~1(0) = {d € FS : L(d) =0} 
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is the set of all K-tuples in FS that are mapped by L(-) to 0, where L~1(1) is anal- 
ogously defined, and where #.A denotes the number of elements in the set A. It 
follows that to prove that E[X] = 0 it suffices to show that if L(-) is not determin- 
istically zero, then the sets L~1(0) and L~+(1) have the same number of elements. 
We prove this by exhibiting a one-to-one mapping from L~'(0) onto L~1(1). (If 
there is a one-to-one mapping from a finite set A onto a finite set B, then A and B 
must have the same number of elements.) To exhibit this mapping, note that the 
assumption that L(-) is not the zero transformation implies that the set L~'(1) is 
not empty. Let d, be an element of this set, so 


L(d.) = 1. (29.29) 
The required mapping maps each dg € L~!(0) to do @ dy: 
L-1(0) 3 do + do @ ds. (29.30) 


We next verify that it is a one-to-one mapping from L~!(0) onto L~!(1). That it is 
one-to-one follows because if dg 6d. = do Gd. then by adding d, to both sides we 
obtain dp 6d, Od, = dh @d. Odx, ie., that do = do (because d, Od, = 0). That 
this mapping maps each element of L~'(0) to an element of L~1(1) follows because, 
as we next show, if dg € L~1(0), then L(dy @d,) = 1. Indeed, if dg € L~1(0), then 


L(do) = 0, (29.31) 
and consequently, 


L(do @ dx) = L(do) @ L(ds) 
=0@1 
=1, 


where the first equality follows from the linearity of L(-), and where the second 
equality follows from (29.29) and (29.31). That the mapping is onto follows by 
noting that if d; is any element of L~!(1), then d; @ dx is in L~!(0) and it is 
mapped by this mapping to dj. 


Using this lemma we can show: 


Proposition 29.5.2. Let (Xj,...,XN) be the result of applying a linear binary 
(K,N) block encoder with antipodal signaling to a binary K-tuple comprising IID 
random bits. 


(i) For every n € {1,...,N}, the RV X, is either deterministically equal to +1, 
or else of zero mean. 


(ii) For every n,7 © {1,...,N}, the random variables X, and X, are either 
deterministically equal to each other or else E[X,X,] = 0. 


Proof. Let the linear binary (K,N) block encoder with antipodal signaling enc(-) 
be given by enc = Yn oT, where T: FS — F) is one-to-one and linear. Let 
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(X1,...,XN) be the result of applying enc to the K-tuple D = (D,,..., Dx), 
where D,,...,D« are IID random bits. 

To prove Part (i), fix some 7 € {1,...,N}, and let L(-) be the linear functional that 
maps d to the 7-th component of T(d), so X, = Y(L(D)), where D denotes the 
row vector comprising the K IID random bits. If L(-) maps all data K-tuples to zero, 
then X,, is deterministically equal to +1. Otherwise, E[X,] = 0 by Lemma 29.5.1. 
To prove Part (ii), let the matrix G represent the mapping T(-), so X,, = Y(DGO”), 
where G) denotes the 7-th column of G. Expressing X,, in a similar way, we 
obtain from (29.15) 


X,Xq = 1(DG“”) T(DG6") 
= 1(DG6™" ® De) 
= T(D(G“” a Gon). (29.32) 


Consequently, if we define the linear functional L: d d(G") ® Gbon')), then 
X,,Xy = T(L(D)). This linear functional is the zero functional if the 7-th column 
of G is identical to its 7-th column, ie., if X,, is deterministically equal to X,,. 
Otherwise, it is not the zero functional, and E[X,,X,,] (= E[Y(L(D))]) must be 
zero (Lemma 29.5.1). 


Proposition 29.5.3 (Producing Zero-Mean Uncorrelated Symbols). A linear bi- 
nary (K,N) block encoder with antipodal signaling enc = Yn oT produces zero- 
mean uncorrelated symbols when fed IID random bits if, and only if, the columns 
of the matrix G representing T(-) are distinct and neither of these columns is the 
all-zero column. 


Proof. The 7-th symbol X,, produced by enc = Yn oT when fed the K-tuple of 
IID random bits D = (D,,..., Dx) is given by 
X, = (DGC?) 
= Y(Di GUN as Dea G{kn)) 
where G(”) is the 7-th column of the K x N generator matrix of T(-). Since the 


linear functional 
dis di<Ger eos ode Gk 


is the zero functional if, and only if, 
Gan =...= G6) = 9, (29.33) 


it follows that X,, is deterministically zero if, and only if, the 7-th column of G is 
zero. From this and Lemma 29.5.1 it follows that all the symbols produced by enc 
are of zero mean if, and only if, none of the columns of G is zero. 


A similar argument shows that the product X,X,,, which by (29.32) is given by 


T(D(G“” ® ctor) ) 
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is deterministically zero if, and only if, the functional 
died, + (GOR GGS1)) G.-@ de- (Gh Gey} 


is zero, i.e., if, and only if, the 7-th and 7’-th columns of G are equal. Otherwise, 
by Lemma 29.5.1, we have ELX,,X,] = 0. 


Note 29.5.4. By Corollary 29.4.3 the property of producing zero-mean uncorre- 
lated symbols is a code property. 


Proposition 29.5.5 (Power and PSD). Let the linear binary (K,N) block encoder 
with antipodal signaling enc = Yn oT produce zero-mean uncorrelated symbols 
when fed IID random bits, and let the pulse shape g satisfy the decay condition 
(14.17). Then the transmitted power P in bi-infinite block-encoding mode is given 


by 


AZ 
P= Ilelle (29.34) 
and the operational PSD is 
AO ka 
Sxx(fJ)= 4-9, FER. (29.35) 


Proof. The expression (29.34) for the power follows either from (14.33) or (14.38). 
The expression for the operational PSD follows either from (15.20) or from (15.23). 


Engineers rarely check whether an encoder produces uncorrelated symbols when 
fed IID random bits. The reason may be that they usually deal with pulse shapes @ 
satisfying the orthogonality condition (29.2) and the decay condition (14.46). For 
such pulse shapes the power is given by (29.27) without any additional assumptions. 
Also, by Theorem 15.4.1, the bandwidth of the PAM signal is typically equal to the 
bandwidth of the pulse shape. In fact, by that theorem, for linear binary (K, N) 
block encoders with antipodal signaling 


bandwidth of PAM signal = bandwidth of pulse shape, (29.36) 


whenever A 4 0; the pulse shape g is a Borel measurable function satisfying the 
decay condition (14.17) for some a, > 0; and the encoder produces zero-mean 
symbols when fed IID random bits. Thus, if one is not interested in the exact form 
of the operational PSD but only in its support, then one need not check whether 
the encoder produces uncorrelated symbols when fed IID random bits. 
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29.6 Performance Criteria 


Designing an optimal decoder for linear binary block encoders with antipodal sig- 
naling is conceptually very simple but algorithmically very difficult. The structure 
of the decoder depends on what we mean by “optimal.” In this chapter we focus 
on two notions of optimality: minimizing the probability of a block error—also 
called message error—and minimizing the probability of a bit error. Referring 
to Figure 28.1, we say that a block error occurred in decoding the v-th block if 
at least one of the data bits (Dw-nK+1 — Dw-1)k+k) was incorrectly decoded. 
We say that a bit error occurred in decoding the j-th bit if D,; was incorrectly 
decoded. 


We consider the case where IID random bits are transmitted in block-mode and 
where the transmitted waveform is corrupted by additive Gaussian noise that is 
white with respect to the bandwidth W of the pulse shape. The pulse shape is 
assumed to satisfy the orthonormality condition (29.2) and the decay condition 
(14.17). From Proposition 28.3.1 it follows that for both optimality criteria, there 
is no loss in optimality in feeding the received waveform to a matched filter for @ 
and in basing the decision on the filter’s output sampled at integer multiples of Ts. 
Moreover, for the purposes of decoding a given message it suffices to consider only 
the samples corresponding to the symbols that were produced when the encoder 
encoded the given message (Proposition 28.4.2). Similarly, for decoding a given 
data bit it suffices to consider only the samples corresponding to the symbols that 
were produced when the encoder encoded the message of which the given bit is part. 
These observations lead us (as in Section 28.4.3) to the discrete-time single-block 
model (28.30). For convenience, we repeat this model here (with the additional 
assumption that the data are IID random bits): 


(X1,..., Xn) =enc(Di,..., Dx); (29.37a) 
Yn =AXn+Zy, 1 €{1,.-.)N}s (29.37b) 
No 
Zy,...,2n ~ UD N (0, ); (29.37c) 
Dig De-S DU O01), (29.374) 
where (Z,,...,ZN) are independent of (Di,...,Dx). We also introduce some 


additional notation. We use z,,(d) for the 7-th component of the N-tuple to which 
the binary K-tuple d is mapped by enc(-): 


z,(d) = 7-th component of enc(d), (n € {1,...,N}, de FS). (29.38) 
Denoting the conditional density of (Y1,...,YN) given (X1,...,XN) by fyjx(-), 
we have for every y € R™ of components y1,...,yn and for every x € {—1,+1}% 
of components 271,...,2N 


tn So, (29.39) 


N 
fyixex(y) = (No) XN? [ler ( No 
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Parameter In Section 21.6 | In Section 29.7 
number of observations J N 

number of hypotheses M ok 

set of hypotheses {1,...,M} nes 

dummy hypothesis variable | m d 

prior {tm} uniform 

conditional mean tuple (ah, ies s)?) (Ax (d), ...,AIN (d)) 
conditional variance o No/2 


Table 29.1: A conversion table for the setups of Section 21.6 and of Section 29.7. 


Likewise, for every y € RN and every data tuple d € FX, 


N 2 
fyjo=a(y) = (#No)~N/? II ae ( ue ey (29.40) 


29.7 Minimizing the Block Error Rate 


29.7.1 Optimal Decoding 


To minimize the probability of a block error, we need to use the random N-vector 
Y = (Y,..., Yn) to guess the K-tuple D = (D1,..., Dx). This is the type of 
problem we addressed in Section 21.6. The translation between the setup of that 
section and our current setup is summarized in Table 29.1: the number of obser- 
vations, which was given there by J, is here N; the number of hypotheses, which 
was given there by M, is here 2‘; the set of possible messages, which was given 
there by M = {1,...,M}, is here the set of binary K-tuples FS; the dummy 
variable for a generic message, which was given there by m, is here the binary 
K-tuple d; the prior, which was denoted there by {7,,}, is here uniform; the mean 
tuple corresponding to the m-th message, which was given there by (sy, ati si?) 
is here (Axj(d),...,An(d)) (see (29.38)); and the conditional variance of each 
observation, which was given there by a7, is here No/2. 


Because all the symbols produced by the encoder take value in {—1, +1}, it follows 


that 
N 


S-(Az,(d))” =A?N, deFk, 

n=1 
so all the mean tuples are of equal Euclidean norm. From Proposition 21.6.1 (iii) 
we thus obtain that, to minimize the probability of a block error, our guess should 
be the K-tuple d* that satisfies 


sad = max > | (4) a (29.41) 


with ties being resolved uniformly at random among the data tuples that achieve 
the maximum. Our guess should thus be the data sequence that when fed to the 
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encoder produces the {+1}-valued N-tuple of highest correlation with the observed 
tuple Y. Note that, by definition, all block encoders are one-to-one mappings 
and thus the mean tuples are distinct. Consequently, by Proposition 21.6.2, the 
probability that more than one tuple d* satisfies (29.41) is zero. 


Since guessing the data tuple is equivalent to guessing the N-tuple to which it is 
mapped, we can also describe the optimal decision rule in terms of the encoder’s 
output. 


Proposition 29.7.1 (The Max-Correlation Decision Rule). Consider the problem 
of guessing D based on Y for the setup of Section 29.6. 


(i) Picking at random a message from the set 


N 
{a e FY: 5 2,(d) ¥, = max ar) sh (29.42) 


deF? 7 


minimizes the probability of incorrectly guessing D. 
(ti) The probability that the above set contains more than one element is zero. 


(itt) For the problem of guessing the encoder’s output, picking at random an N- 
tuple from the set 


N 


{5 € Image(enc) + dat 


* ceteiace ze(ene) 4 


Sonny 1 (29.43) 
minimizes the probability of error. This set contains more than one element 
with probability zero. 


Conceptually, the problem of finding an N-tuple that has the highest correlation 
with (Yi,..., Yn) among all the N-tuples in the image of enc(-) is very simple: one 
goes over the list of all the 2“ N-tuples that are in the image of enc(-) and picks 
the one that has the highest correlation with (Yi1,..., YN). But algorithmically 
this is very difficult because 2* is in most applications a huge number. It is one of 
the challenges of Coding Theory to come up with encoders for which the decoding 
does not require an exhaustive search over all 2% tuples. As we shall see, the 
single parity check code is an example of such a code. But the performance of this 
encoder is, alas, not stellar. 


29.7.2 Wagner’s Rule 


For the (K, K +1) systematic single parity check encoder (29.20), the decoding can 
be performed very efficiently using a decision algorithm that is called Wagner’s 
Rule in honor of C.A. Wagner. Unlike the brute-force approach that considers all 
possible data tuples and which thus has a complexity which is exponential in K, 
the complexity of Wagner’s Rule is linear in K. 
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Wagner’s Rule can be summarized as follows. Consider the (K +1) tuple 


1 ifY)> 
ee a ee ee eee (29.44) 
—1 otherwise, 


If this tuple has an even number of negative components, then guess that the en- 
coder’s output is (€1,...,€k+41) and that the data sequence is thus the inverse of 
(€1,.--,&k) under the componentwise antipodal mapping Tx, i.e., that the data 
tuple is (1— &)/2,...,(1—&«)/2. Otherwise, flip the sign of &), corresponding to 
the Y,, of smallest magnitude. I.e., guess that the encoder’s output is 


Giis.5 84 $n 19 Gyn Gaga = see R dy (29.45) 


and that the data bits are 


eet LSet: bone Veneta oe (29.46) 
7 5 ae aA 5 a ae ae : 
where n, is the element of {1,...,K +1} satisfying 
Yn. | = 1< need [Ynl- (29.47) 


Proof that Wagner’s Rule is Optimal. Recall that the (K,K + 1) single parity 
check code with antipodal signaling consists of all +1-valued (K + 1)-tuples having 
an even number of —1’s. We seek to find the tuple that among all such tuples max- 
imizes the correlation with the received tuple (Y1,..., ¥«+1). The tuple defined in 
(29.44) is the tuple that among all tuples in {—1, +1}**1 has the highest correla- 
tion with (Yj,...,Yk41). Since flipping the sign of €, reduces the correlation by 
2|Y,,|, the tuple (29.45) has the second-highest correlation among all the tuples in 
{—1,+1}**1. Since the tuples (29.44) and (29.45) differ in one component, exactly 
one of them has an even number of negative components. That tuple thus maxi- 
mizes the correlation among all tuples in {—1,+1}++ that have an even number 
of negative components and is thus the tuple we are after. 


Since the encoder is systematic, the data tuple that generates a given encoder 
output is easily found by considering the first K components of the encoder output 
and by then applying the mapping +1 +> 0 and —1+> 1, ie, € (1 — €)/2. 


29.7.3. The Probability of a Block Error 


We next address the performance of the detector that we designed in Section 29.7.1 
when we sought to minimize the probability of a block error. We continue to assume 
that the encoder is a linear binary (K,N) block encoder with antipodal signaling, 
so the encoder function enc(-) can be written as enc = Yn oT where T: FS — FD 
is a linear one-to-one mapping and Tn (-) is the componentwise antipodal mapping. 
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An Upper Bound 


It is usually very difficult to precisely evaluate the probability of a block error. A 
very useful bound is the Union Bound, which we encountered in Section 21.6.3. 
Denoting by pmap(error|D = d) the probability of error of our guessing rule con- 
ditional on the binary K-tuple D = d being fed to the encoder, we can use (21.59), 
Table 29.1, and (29.18) to obtain 


2A2dy(T(d’), T(d 

pmap(error|D=d)< SJ) Q i w(T(@’), Ta) | (29.48) 

No 
d’eFK \ {d} 


It is customary to group all the equal terms on the RHS of (29.48) and to write 
the bound in the equivalent form 


. 2 
mre =a) < Sea cr aura a) =} (f°) 
- (29.49) 
where 
#{d! € FS: du(T(d’), T(d)) =v} (29.50) 


is the number of data tuples that are mapped by T(-) to a binary N-tuple that 
is at Hamming distance v from T(d), and where the sum excludes vy = 0 because 
the fact that T(-) is one-to-one implies that if d’ A d then the Hamming distance 
between T(d’) and T(d) must be at least one. 

We next show that the linearity of T(-) implies that the RHS of (29.49) does not 
depend on d. (In Section 29.9 we show that this is also true of the LHS.) To this 
end we show that for every v € {1,...,N} and for every d € FS, 


#{a’ e€ FS : dy(T(d’), T(d)) = v} = #{d € FS : wy(T(d)) = v} (29.51) 
= #{c € Image(T): wu(c) =v}, (29.52) 


where the RHS of (29.51) is the evaluation of the LHS at d = 0. To prove (29.51) 
we note that the mapping d’ +> d’ Gd is a one-to-one mapping from the set whose 
cardinality is written on the LHS to the set whose cardinality is written on the 
RHS, because 


(au(T(d’), T(d)) =v) & (wa(T(a) © T(a’)) =») 
S (wu(T(d a) d’)) — v), 


where the first equivalence follows from (29.13), and where the second equivalence 


follows from the linearity of T(-). To prove (29.52) we merely substitute c for T(d) 
in (29.51) and use the fact that T(-) is one-to-one. 


Combining (29.49) with (29.52) we obtain the bound 


2A2y 


N 
pwap(error|D = d) < a #{c € Image(T) : wy(c) =v} Q . (29.53) 
v=1 


670 Linear Binary Block Codes with Antipodal Signaling 


The list of N + 1 nonnegative integers 
(#{e € Image(T) : wH(c) = 0},...,#{e € Image(T) : wu(c) = N}) 


(whose first term is equal to one and whose terms sum to 2) is called the weight 
enumerator of the code. 


For example, for the (K, K + 1) single parity check code 


0 if v is odd, 


(“¥*) if v is even, y=0,...,K+1 


#{d EF: wy (T(d)) = v} = 


because this code consists of all (K + 1)-tuples of even Hamming weight. Conse- 
quently, this code’s weight enumerator is 


K+1 K+1 K+1 it Sh 
(10.( 9 ).0.( 4 Janene) if K is odd; 
K+1 K+1 K+1 eee 
(1.0.( 9 ).0.( i joeat K ).0) if K is even. 


The minimum Hamming distance dyin of a linear (K,N) F2 code is the 
smallest Hamming distance between distinct elements of the code. (If K = 0, ie., 
if the only codeword is the all-zero codeword, then, by convention, the minimum 
distance is said to be infinite.) By (29.52) it follows that (for K > 0) the minimum 
Hamming distance of a code is also the smallest weight that a nonzero codeword 
can have 


diin,H = peer wy(C). (29.54) 
With this definition we can rewrite (29.53) as 
> 2A2v 
pmap(error|D = d) < S° #{c € Image(T) : wa(c) = v} O No 
ara (29.55) 


Engineers sometimes approximate the RHS of (29.55) by its first term: 


2A? din. 


#{c € Image(T) : wy(c) = dinin,H } a) No 


(29.56) 


This is reasonable when A?/No >> 1 because the Q(-) function decays very rapidly; 
see (19.18). 


The term 
#{c € Image(T) : wu(c) = dita} (29.57) 


is sometimes called the number of nearset neighbors. 
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A Lower Bound 


Using the results of Section 21.6.4, we can obtain a lower bound on the probability 
of a block error. Indeed, by (21.65), Table 29.1, (29.18), the monotonicity of Q(-), 
and the definition of dmin,# 


2A dimin,H 


pmap(error|D = d) > Q 
No 


(29.58) 


29.8 Minimizing the Bit Error Rate 


In some applications we want to minimize the number of data bits that are incor- 
rectly decoded. This performance criterion leads to a different guessing rule, which 
we derive and analyze in this section. 


29.8.1 Optimal Decoding 


We next derive the guessing rule that minimizes the average probability of a bit 
error, or the Bit Error Rate. Conceptually, this is simple. For each « € {1,..., K} 
our guess of the «-th data bit D,, should minimize the probability of error. This 
problem falls under the category of binary hypothesis testing, and, since D,, is a 
priori equally likely to be 0 or 1, the Maximum-Likelihood rule of Section 20.8 is 
optimal. To compute the likelihood-ratio function, we treat the other data bits 
Dy,..., D1, De41,---, Dx as unobserved random parameters (Section 20.15.1). 
Thus, using (20.101) with the random parameter © now corresponding to the tuple 
(Di, sey D1, D415 seey Dx) we obtain? 


fy\p,=0(%1; ons UN) 


SOY SS «i lei A enuN) (29.59) 
de Axo 
N 2 
eer es) _N/2 (Yn — Axn(d)) 
=2 (No) S> [ex ( Ne , (29.60) 
dé Axo n=1 


where the set A,,9 consists of those tuples in i whose «-th component is zero 


Aes = {(di,...,dx) € Kd, = 0}. (29.61) 
Likewise, 


fy|p.=1(Y1s-++5 YN) 


=2—) SO fy,...¥wD<al¥1---s9N) (29.62) 
deéAx 1 


?Our assumption that the data are IID random bits guarantees that the random parameter 
ef (Dy,..., De—1, De4+1;---, Dx) is independent of the RV D, that we wish to guess. 
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= KD ANG)N? Io ( rae) ) (29.63) 


de A,..1 7=1 No 
where we similarly define 
Ana = {(dis++-5dx) € FS : dy = 1}. (29.64) 


Using Theorem 20.7.1 and (29.60) & (29.63) we obtain the following. 


Proposition 29.8.1 (Minimizing the BER). Consider the problem of guessing Dx 
based on Y for the setup of Section 29.6, where x € {1,...,K}. The decision rule 
that guesses “D,, = 0” if 


> Ioo( oe) s- Ioo( Boge, 


deAx,o n=1 deAx1 7=1 


that guesses “D,, = 1” if 


oD [oo ( io ee) si [oo ( arial) 


déA,..9 N=1 de A,..1 n=1 


and that guesses at random in case of equality minimizes the probability of guessing 
the data bit D,, incorrectly. 


The difficulty in implementing this decision rule is that, unless we exploit some 
algebraic structure, the computation of the sums above has exponential complexity 
because the number of terms in each sum is 2K~1. 


It is interesting to note that, unlike the decision rule that minimizes the probability 
of a block error, the above decision rule depends on the value of No /2. 


29.8.2 The Probability of a Bit Error 


We next obtain bounds on the probability that the detector of Proposition 29.8.1 
errs in guessing the «-th data bit D,.. We denote this probability by p%. 


An Upper Bound 


Since the detector of Proposition 29.8.1 is optimal, the probability that it errs 
in decoding the «-th data bit D, cannot exceed the probability of error of the 
suboptimal rule whose guess for D,, is the «-th bit of the message produced by the 
detector of Section 29.7. Thus, if émap(-) denotes the decision rule of Section 29.7, 
then 

py, <Pr[D © dmap(Y) € Anal, «© {1,...,K}, (29.65) 


where the set A,,1 was defined in (29.64) as the set of messages whose «-th com- 
ponent is equal to one, and where Y is the observed N-tuple whose components 
are given in (29.37b). 
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Since the data are IID random bits, we can rewrite (29.65) as 
* 1 x 
Dr ie S” SO Prl[émap(¥Y)=ded|D=d], «e€ {1,...,K}. (29.66) 


deF¥ deA,.1 


Since dyap (Y) can only equal d® d if Y is at least as close in Euclidean distance 
to enc(d @ d) as it is to enc(d), it follows from Lemma 20.14.1, Table 29.1, and 
(29.18) that 


Ady (tn (Tid @d)), Yn (T(d))) 
2,/% 
A? dz, (tn (T(d@d)), Tn (T(d))) 
2No 


Pr[émap(Y) =d@d|D=d] <Q 


=Q 


i 2A2du(T(d @ d), T(d)) 
No 


i 2A?wy(T(d 6 d) @ T(d)) 
=9Q 
No 


= PA wat) (29.67) 


It follows from (29.66) and (29.67) upon noting that RHS of (29.67) does not 
depend on the transmitted message d that 


5 - 
p< SOQ a Js we {hs Kh. (29.68) 


This bound is sometimes written as 


2 2A*y 
pe < ) (uv, «) Q , KE{I,..., Kf, (29.69a) 
v=dypin.yH No 


where 7(v,«) denotes the number of elements d of FS whose «-th component is 


equal to one and for which T(d) is of Hamming weight », i.e., 
o(v,k) = #{d € Ana: wu(T(d)) =r}, (29.69b) 


and where the minimum Hamming distance dyin,y is defined in (29.54). 


Sometimes one is more interested in the arithmetic average of p> 


1 K 
Zr (29.70) 
K=1 
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which is the optimal bit error rate. We next show that (29.68) leads to the 
upper bound 


1 K 
putes gs HO (29.71) 


i 
Mex 
M 

a. 
Z 

ie 

Qu 

is 

a 

= 


K=1 deFk 
2A*wy(T(d)) \ & 
= ye a) No S“I{d € Axi} 
deFs K=1 
2A T(d 
0 
ders 


where the inequality in the first line follows from (29.68); the equality in the second 
by introducing the indicator function for the set A,.; and extending the summa- 
tion; the equality in the third line by changing the order of summation; and the 
equality in the last line by noting that every d € F¥ is in exactly wy(d) of the sets 
Ail; bres AK: 


A Lower Bound 


We next show that, for every & € {1,...,K}, the probability p* that the optimal 
detector for guessing the k-th data bit errs is lower-bounded by 


2 
ee, 2A*wy(T(d)) 


———_—_— 29.72 
~ déAna No , ( ) 


where A,,; denotes the set of binary K-tuples whose «-th component is equal to 
one (29.64). To derive (29.72), fix some d € A,,1 and note that for every d’ € FS 


(d’ € Axo) @ (d’ @d € Ay). (29.73) 
This allows us to express fy)p,=1(y) for every y € RN as 


fy|p,.=1(y) = 2° *-Y o> fyjp=alY) 


déAx1 


=2°—) S° fyp=aea'(y), (29.74) 
dV EAx,0 
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where the first equality follows from (29.62) and the second from (29.73). 


Using the exact expression for the probability of error in binary hypothesis testing 
(20.20) we have: 


Pi = 5 de min{ fy|p,=0(¥), fy|p.=1(¥) } dy 


= 5 f mina S- fy|p=a'(y), g—(K-1) S- frip-acer(y) by 


d'EAx,0 d’EAx,0 


aaa Ss fyp=a(y), So frip-ase(y) by 


dE Axo d’EAx,0 


—(K-1) >| ye min { fyp= a(y), fyp=aear(y) } dy 


WEA, 0 


= 2-(K-1) S- [as a a’(y), fyip-aear(y) } dy 


dE Ax, 


= du (T T(d’ @ d)) 


No 


—(K-1) S- a) 


d’EAx,0 


where the first line follows from (20.20); the second by the explicit forms (29.59) & 
(29.74) of the conditional densities fy|p,—o(-) and fy)p,,=1(-); the third by pulling 
the common term 2~‘*—) outside the minimum; the fourth because the minimum 
between two sums with an equal number of terms is lower-bounded by the sum of 
the minima between the corresponding terms; the fifth by swapping the summation 
and integration; the sixth by Expression (20.20) for the optimal probability of error 
for the binary hypothesis testing between D = d’ and D=d @d’; the seventh by 
the linearity of T(-); and the final line because the cardinality of A,.o is 2“-)). 
Since the above derivation holds for every d € A,,.1, we may choose d to yield the 
tightest bound, thus establishing (29.72). 


29.9 Assuming the All-Zero Codeword 


When simulating linear binary block encoders with antipodal signaling over the 
Gaussian channel we rarely simulate the data as IID random bits. Instead we 
assume that the message that is fed to the encoder is the all-zero message and that 
the encoder’s output is hence the N-tuple whose components are all +1. In this 
section we shall explain why it is correct to do so. More specifically, we shall show 
that pmap(error|D = d) does not depend on the message d and is thus equal to 
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pmap(error|D = 0). We shall also prove an analogous result for the decoder that 
minimizes the probability of a bit error. The proofs are based on two features of 
our setup: the encoder is linear and the Gaussian channel with antipodal inputs is 
symmetric in the sense that 


fyjx=-19) = fy|xa(-y), yeER. (29.75) 
Indeed, by (29.37b), 
1 _ (yta)? 
€ No 


fy|x=-1(y) = JaNo 
1 


No 
= fy|xsu(-y), yER. 
Definition 29.9.1 (Memoryless Binary-Input/Output-Symmetric Channel). We 
say that the conditional distribution of Y = (¥Y1,...,YNn) conditional on X = 


(X,...,XN) corresponds to a memoryless binary-input/output-symmetric 
channel if 


N 
fy|x=x( =i fy|x=e,(Yn), x €{-1,+1}N, (29.76a) 


where 


fyjx=-1(y) = fyjxa4i(-y), yeER. (29.76b) 


For every d € F¥ define the mapping wa: RN — RN as 


wa: (yis---, YN) + (yi21(d),-..,yntn(d)). (29.77) 


The function wq(-) thus changes the sign of those components of its argument 
that correspond to the negative components of enc(d). The key properties of this 
mapping are summarized in the following lemma. 


Lemma 29.9.2. As in (29.38), let x,(d) denote the result of applying the antipodal 
mapping Y to the n-th component of T(d), where T: FS — F’ is some one-to-one 
linear mapping. Let the conditional law of (Y1,...,YN) given D = d be given by 


is fy|x=a,(a)(Yn), where fy|x(-) satisfies the symmetry property (29.75). Let 
Wa(-) be defined as in (29.77). Then 


(i) wo(-) maps each y € RN to itself. 
(ii) For any d,d’ € F§ the composition of wa with Wa is given by aga: 
Wa 0 Wa = Waea’- (29.78) 
(itt) ha is equal to its inverse 


va(va(y)) =y, yeRN. (29.79) 


(iv) For every d € F§ the Jacobian of the mapping wa(-) is one. 
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(v) For every d € F¥ and every y € RN, 
fy\p=a(y) = fy\p=0(va(y)). (29.80) 
(vi) For any d,d’ € FS and every y € RN, 
fy\p=a (Waly) = fyjp=a'ealy)- (29.81) 


Proof. Part (i) follows from the definition (29.77) because the linearity of T(-) and 
the definition of YN guarantee that z,(0) = +1, for all n € {1,...,N}. Part (ii) 
follows by linearity and from (29.15): 


(ao ba )(y1,---,9N) = Va(yizi(d’),...,ynzn(d’)) 
= (y21(d’)21(d),...,ynan(d’)en(d)) 
= (y2i(d' @d),...,yntn(d’ @d)) 
= aoa’ (Y1, ees SYN); 
where in the third equality we used (29.15) and the linearity of the encoder. 
Part (iii) follows from Parts (i) and (ii). Part (iv) follows from Part (iii) or di- 
rectly by computing the partial derivative matrix and noting that it is diagonal 


with the diagonal elements being +1 only. Part (v) follows from (29.75). To prove 
Part (vi) we substitute d’ for d and wWa(y) for y in Part (v) to obtain 


fy\p=a (va(y)) = fy|p=0 (var (vay))) 
= fy\p=0(Waea(y)) 


= fy|\p=aea'(y), 


where the second equality follows from Part (ii), and where the third equality 
follows from Part (v). 


With the aid of this lemma we can now justify the all-zero assumption in the 
analysis of the probability of a block error. We shall state the result not only for 
the Gaussian setup but also for the more general case where the conditional den- 
sity fy|x(-) corresponds to a memoryless binary-input /output-symmetric channel. 


Theorem 29.9.3. Consider the setup of Section 29.6 with the conditional den- 
sity fy|x(-) corresponding to a memoryless binary-input/output-symmetric chan- 
nel. Let pmap(error|D = d) denote the conditional probability of a block error for 
the detector of Proposition 29.7.1, conditional on the data tuple being d. Then, 


pmap(error|D = d) = pmap(error|D = 0), d€ FS. (29.82) 


Proof. The proof of this result is not very difficult, but there is a slight technicality 
that arises from the way ties are resolved. Since on the Gaussian channel ties occur 
with probability zero (Proposition 21.6.2), this issue could be ignored. But we 
prefer not to ignore it because we would like the proof to apply also to channels 
satisfying (29.76) that are not necessarily Gaussian. To address ties, we shall 
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assume that they are resolved at random as in Proposition 29.7.1 (i.e., as in the 
Definition 21.3.2 of the MAP rule). 


For every d € FS and every v € {1,...,2}, define the set Da, C RN to contain 
those y € RN for which the following two conditions hold: 


fy|p=a(y) = pene fy\p=a’ (y), (29.83a) 
#{d FS: fyp_aly) = fyip=aly) }= ». (29.83b) 


Whenever y € Dav, the MAP rule guesses “D = d” with probability 1/v. Thus, 


9k 
1 
pmap (error|D = d) = 1— > -| <4 Hy € Dav} fy\p=aly) dy. (29.84) 
v=1 ycR 


The key is to note that, by Lemma 29.9.2 (v), for every d € FS andy € {1,...,2*} 
(y € Dav) + (daly) € Dow). (29.85) 
(Please pause to verify this.) Consequently, by (29.84), 
Qk 


1 
pmap(error|D = d) = 1— S- al 


Hy € Dav} fyjp=a(y) dy 
yeRN 


on » : iis a Ify ¢ Dav} fyjp=0(valy)) dy 


Sake: » “ i Kva(y) € Dav} fyjp=0(y) dy 


er De cls fy € Dov} fyjp=o(y) dy 
y 


v= 


= pmap(error|D = 0), 


where the first equality follows from (29.84); the second by Lemma 29.9.2 (v); the 
third by defining y = Wa(y) and using Parts (iv) and (iii) of Lemma 29.9.2; the 
fourth by (29.85); and the final equality by (29.84). 


We now formulate a similar result for the detector of Proposition 29.8.1. Let 
p.(error|D = d) denote the conditional probability that the decoder of Proposi- 
tion 29.8.1 incorrectly decodes the «-th data bit, conditional on the tuple d being 
fed to the encoder. Since the data are IID random bits, 


p, = 2-* S$ pk (error|D = d). (29.86) 
ders 


29.9 Assuming the All-Zero Codeword 679 


Since ties are resolved at random 
p-(error|D = d) 


= Pr| > fyp=a(¥)> So fyp=a/(¥) | D= a] 


d’E€A,.,0 d’eA, 1 


+57 d) fyp-a(¥)= So fym-a(¥)|D =<], dc Ay,1, (29.87) 


d’EAx,0 d/EAQ 1 
and 
p;.(error|D = d) 


= Pr| do feiwea(¥) < SO fypw=ar(¥) | D= al 


dE Axo WVEAK 1 


+; P| oe fy\D=a’ (Y) = S- fy\p=a (Y) | D= al, de Ax.,0- (29.88) 


dEAx,0 d/EAx1 


Theorem 29.9.4. Under the assumptions of Theorem 29.9.8, we have for every 
&€ {1,...,K} 


p*(error|D = d) = p*(error|D = 0), de FS, (29.89) 
and consequently 


p., = p..(error|D = 0). (29.90) 


Proof. It suffices to prove (29.89) because (29.90) will then follow by (29.86). To 
prove (29.89) we begin by defining e(d) for d € F¥ as follows. If d is in A,1, then 
we define e(d) as 


e(d) = Pr| S- fy|p=a(Y) > S- fy|p=a’(Y) | D= al, dé Aya. 


d'E€A,..0 de A, 1 


Otherwise, if d is in A,.9, then we define e(d) as 


e(d) 4 Pr| S- fy|p=a' (Y) < S- fy\p=a/(Y) | D= al, de Ax.,0- 


d’E€Ax,0 WEAK 1 


We shall prove (29.89) for the case where 
dé Ay. (29.91) 
The proof for the case where d € A,,.9 is almost identical and is omitted. For d 


satisfying (29.91) we shall prove that e(d) does not depend on d. The second term 
in (29.87) which accounts for the random resolution of ties can be treated very 
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similarly. To show that e(d) does not depend on d we compute: 


As if yD Py\p=a'(y ae fy|p=a’( @)} frp aly) dy 
y 


d’EAx,0 dEAQ 1 


ye fy\|p=a' (y > fy\p= a’( w)} foe o(valy )) dy 


d’EAx,0 WE Ax 1 


So fypea (val¥)) > SO fyppee (Wal¥ 1) frmaol#) a9 


ERN { d’EA,..0 tae 1 


fy\p=a'ea(¥ =i fy\p=a'eal(¥ 9) o(y) dy 


d’E€A,..0 dWeA,, ei) 


fyjp-al¥ Sy fy\p=a( 9) fim o(¥) dy 
deAx,1 déAx.,0 
= e(0), 


where the second equality follows from Lemma 29.9.2 (v); the third by defining 
the vector y as y = wa(y) and by Parts (iv) and (iii) of Lemma 29.9.2; the fourth 
by Lemma 29.9.2 (vi); and the fifth equality by defining d 4 d@ d and using 
(29.73). 


29.10 System Parameters 


We next summarize how the system parameters such as power, bandwidth, and 
block error rate are related to the parameters of the encoder. We only address the 
case where the pulse shape @ satisfies the orthonormality condition (29.2). As we 
next show, in this case the bandwidth W in Hz of the pulse shape can be expressed 
as 


N 
Ro X 


(1 + excess bandwidth), (29.92) 


where Rp is the bit rate at which the data are fed to the modem in bits per 
second, and where the excess bandwidth, which is defined in Definition 11.3.6, is 
nonnegative. To verify (29.92) note that if the data arrive at the encoder at the 
rate of Ry bits per second and if the encoder produces N real symbols for every K 
bits that are fed to it, then the encoder produces real symbols at a rate 


N real symbol 
RR, ==—R 29. 
K | second | a2) 
so the baud period must be 
K 1 
l= ==. 29.94 
NR, (29.94) 


It then follows from Definition 11.3.6 that the bandwidth of @ is given by (29.92) 
with the excess bandwidth being nonnegative by Corollary 11.3.5. 
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As to the transmitted power P, by (29.27) and (29.94) it is given by 


P = Ey Rb, (29.95) 


where E, denotes the energy per data bit and is given by 


N 
Ep = — A’. (29.96) 
K 
It is customary to describe the error probability by which one measures performance 
as a function of the energy-per-bit E,.? Thus, for example, one typically writes the 
upper bound (29.55) on the probability of a block error using (29.96) as 


pmap(error|D = d) 
N 
< S- #{c € Image(T) : wH(c) =v} O 


v=dymin,H 


2E,(K/N)v 


Ne (29.97) 


29.11 Hard vs. Soft Decisions 


In Section 29.7 we derived the decision rule that minimizes the probability of a block 
error. We saw that, in general, its complexity is exponential in the dimension K of 
the code because a brute-force implementation of this rule requires correlating the 
N-tuple Y with each of the 2* tuples in Image(enc). For the single parity check 
rule we found a much simpler implementation of this rule, but for general codes 
the decoding problem can be very difficult. 


A suboptimal decoding rule that is sometimes implemented is the Hard Decision 
decoding rule, which has two steps. In the first step one uses the observed real- 
valued N-tuple (Y1,..., YN) to form the binary tuple (C),...,CN) according to 


the rule 
a if Y, > 
Cy = ; : net =1 
1 if Y, <0, 


and in the second step one searches for the message d for which Td) is closest in 
Hamming distance to (C,,...,CN). The advantage of this decoding rule is that 
the first step is very simple and that the second step can be often performed very 
efficiently if the code has a strong algebraic structure. 


29.12 The Varshamov and Singleton Bounds 


Motivated by the approximation (29.56) and by (29.58), a fair bit of effort in 
Coding Theory has been invested in finding (K, N) codes that have a large minimum 


3The terms “energy-per-bit,” “energy-per-data-bit,” and “energy-per-information-bit” are 
used interchangeably. 
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Hamming weight and reasonable decoding complexity. One of the key existence 
results in this area is the Varshamov Bound. We state here a special case of this 
bound pertaining to our binary setting. 


Theorem 29.12.1 (The Varshamov Bound). Let K and N be positive integers, 
and let d be an integer in the range2<d<N—K+1. Jf 


d—2 
N-1 
> ( ie (29.98) 
v=0 


then there exists a linear (K,N) F2 code whose minimum distance dmin,u satisfies 
dyin,H = d. 


Proof. See, for example, (MacWilliams and Sloane, 1977, Chapter 1, Section 10, 
Theorem 12) or (Blahut, 2002, Chapter 12, Section 3, Theorem 12.3.3). 


A key upper bound on dyin,y is given by the Singleton Bound. 


Theorem 29.12.2 (The Singleton Bound). Jf N and K are positive integers, then 
the minimum Hamming distance dmin,n of any linear (K,N) F2 code must satisfy 


davadee NUK 44: (29.99) 


Proof. See, for example, (Blahut, 2002, Chapter 3, Section 3, Theorem 3.2.6) or 
(van Lint, 1998, Chapter 5, Section 2, Corollary 5.2.2) or Exercise 29.10. 


29.13 Additional Reading 


We have only had a glimpse of Coding Theory. A good starting point for the 
literature on Algebraic Coding Theory is (Roth, 2006). For more on the modern 
coding techniques such as low-density parity-check codes (LDPC) and turbo-codes, 
see (Richardson and Urbanke, 2008). 


The degredation resulting from hard decsions is addressed, e.g., in (Viterbi and 
Omura, 1979, Chapter 3, Section 3.4). 


The results of Section 29.9 can be extended also to non-binary codes with other 
mappings. See, for example, (Loeliger, 1991) and (Forney, 1991). 


For some of the literature on the minimum distance and its asymptotic behavior 
in the block length, see, for example, (Roth, 2006, Chapter 4) 


For more on the decoding complexity see the notes on Section 2.4 in Chapter 2 of 
(Roth, 2006). 


29.14 Exercises 


Exercise 29.1 (Orthogonality of Signals). Recall that, given a binary K-tuple d € FX 
and a linear (K,N) Fe encoder T(-), we use x,(d) to denote the result of applying the 
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antipodal mapping Y(-) to the 7-th component of T(d). Let the pulse shape @ be such 
that its time shifts by integer multiples of the baud period T; are orthonormal. Show that 


(t + Sim (d) dt nT), 8 San (a) ot nt) Ais 


n=1 


if, and only if, du(T(d), T(d’)) = N/2. 


Exercise 29.2 (How Many Encoders Does a Code Have?). Let the linear (K,N) Fo 
encoder T: FX — F’ be represented by the K x N matrix G. Show that any linear 
(K, N) Fe encoder whose image is equal to the image of T can be written in the form 


dis dAG, 


where A is a K x K invertible matrix whose entries are in Fz. How many such matrices A 
are there? 


Exercise 29.3 (The (4,7) Hamming Code). A systematic encoder for the linear (4,7) F2 
Hamming code maps the four data bits d1,d2, d3,d4 to the 7-tuple 


(di, dz, d3, da, di @ d3 @ da, di ® do @ da, dz @ ds @ da). 


Suppose that this encoder is used in conjunction with the componentwise antipodal map- 
ping Y7(-) over the white Gaussian noise channel with PAM of pulse shape whose time 
shifts by integer multiples of the baud period are orthonormal. 


(i) Write out the 16 binary codewords and compute the code’s weight enumerator. 
(ii) Assuming that the codewords are equally likely and that the decoding minimizes the 
probability of a message error, use the Union Bound to upper-bound the probability 
of codeword error. Express your bound using the transmitted energy per bit Ep. 
(iii) Find a lower bound on the probability that the first bit D, is incorrectly decoded. 
Express your bound in terms of the energy per bit. Compare with the exact ex- 
pression in uncoded communication. 


Exercise 29.4 (The Repetition Code). Consider the linear (1,N) F2 repetition code 
consisting of the all-zero and all-one N-tuples (0,...,0) and (1,...,1). 


(i) Find its weight enumerator. 


(ii) Find an optimal decoder for a system employing this code with the componentwise 
antipodal mapping Yn(-) over the white Gaussian noise channel in conjunction 
with PAM with a pulse shape whose times shifts by integer multiples of the baud 
period are orthonormal. 


(iii) Find the optimal probability of error. Express your answer using the energy per 
bit E,. Compare with uncoded antipodal signaling. 


(iv) Describe the hard decision rule for this setup. Find its performance in terms of Ep. 


Exercise 29.5 (The Dual Code). We say that two binary «-tuples u = (u1,..., ux) and 
v = (v1,...,U) are orthogonal if 


U1: U1 BU2: V2 O°: PUK: Ve = 0. 


Consider the set of all N-tuples that are orthogonal to every codeword of some given 
linear (K,N) F2 code. Show that this set is a linear (N — K, N) F2 code. This code is 
called the dual code. What is the dual code of the (K, K + 1) single parity check code? 
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Exercise 29.6 (Hadamard Code). For a positive integer N which is a power of two, define 
the N x N binary matrix Hw recursively as 


0 0 Hn 2 2) 
H2 = » HnN= = , N=4,8,16,..., 29.100 
. (6 i) ; a Hn 2 ( ) 


where H denotes the componentwise negation of the matrix H, that is, the matrix whose 
Row-j Column-é element is given by 1@[H];,c, where [H];,¢ is the Row-7 Column-¢ element 
of H. Consider the set of all rows of Hn. 


(i) Show that this collection of N binary N-tuples forms a linear (log, N, N) Fe code. 
This code is called the Hadamard code. Find this code’s weight enumerator. 


(ii) Suppose that, as in Section 29.6, this code is used in conjunction with PAM over 


the white Gaussian noise channel and that Yi,...,YN are as defined there. Show 
that the following rule minimizes the probability of a message error: compute the 
vector 
Y% 
P Yo 
Hn : (29.101) 
YN 


and guess that the m-th message was sent if the m-th component of this vector is 
largest. Here Hn is the N x N matrix whose Row-j Column-é entry is the result 
of applying Y(-) to the Row-j Column-é entry of Hn. 

(iii) A brute-force computation of the vector in (29.101) requires N? additions, which 
translates to N*/log, N additions per information bit. Use the structure of Hn 
that is given in (29.100) to show that this can be done with N log, N additions 
(or N additions per information bit). 


Hint: For Part (iii) provide an algorithm for which c(N) = 2c(N/2) +N, where c(n) 
denotes the number of additions needed to compute this vector when the matriz is n x n. 
Show that the solution to this recursion for c(2) = 2 is c(n) = nlogyn. 


Exercise 29.7 (Bi-Orthogonal Code). Referring to the notation introduced in Exer- 
cise 29.6, consider the 2N x N matrix 

Hn 

Hn /’ 


where N is some positive power of two. 


(i) Show that the rows of this matrix form a linear (log,(2N), N) F2 code. 

(ii) Compute the code’s weight enumerator. 

(iii) Explain why we chose the title “Bi-Orthogonal Code” for this exercise. 
) 


(iv) Find an efficient decoding algorithm for the setup of Section 29.6. 

Exercise 29.8 (Non-IID Data). How would you modify the decision rule of Section 29.8 if 
the data bits (Di,..., Dx) are not necessarily IID but have the general joint probability 
mass function Pp(-)? 


Exercise 29.9 (Asymmetric Channels). Show that Theorem 29.9.3 will no longer hold if 
we drop the hypothesis that the channel is symmetric. 
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Exercise 29.10 (A Proof of the Singleton Bound). Use the following steps to prove the 
Singleton Bound. 


(i) Consider a linear (K,N) F2 code. Let 7: FY — FX~' map each N-tuple to the 
(K — 1)-tuple consisting of its first K — 1 components. By comparing the number of 
codewords with the cardinality of the range of 7, argue that there must exist two 
codewords whose first K — 1 components are identical. 

(ii) Show that these two codewords are at Hamming distance of at most N — K +1. 

(iii) Show that the minimum Hamming distance of the code is at most N — K +1. 


(iv) Does linearity play a role in the proof? 


Exercise 29.11 (Binary MDS Codes). Codes that satisfy the Singleton Bound with equal- 
ity are called Maximum Distance Separable (MDS). Show that the linear (K, K + 1) F2 
single parity check code is MDS. Can you think of other binary MDS codes? 


Exercise 29.12 (Existence via the Varshamov Bound). Can the existence of a linear (4, 7) 
F2 code of minimum Hamming distance 3 be deduced from the Varshamov Bound? 


Appendix A 


On the Fourier Series 


A.1 Introduction and Preliminaries 


We survey here some of the results on the Fourier Series that are used in the book. 
The Fourier Series has numerous other applications that we do not touch upon. 
For those we refer the reader to (Katznelson, 1976), (Dym and McKean, 1972), 
and (Kérner, 1988). 


To simplify typography, we denote the half-open interval [—1/2,1/2) by I: 


12 foeR:-5<0< 5}. (A.1) 


Definition A.1.1 (Fourier Series Coefficient). The n-th Fourier Series Coef- 
ficient of an integrable function g: 1 — C is denoted by g(n) and is defined for 
every integer n by 


a(n) 4 | (0) e2*9 a6, (A.2) 


The periodic extension of the function g: I — C is denoted by gp: R — C and 
is defined as 


gp(n + 0) = 9(8), (n EZ, 0€ I). (A.3) 


We say that g: I — C is periodically continuous if its periodic extension gp is 
continuous, i.e., if g(-) is continuous in I and if, additionally, 


je oe) = g(-1/2). (A.4) 


A degree-n trigonometric polynomial is a function of the form 


OH ys a,e2""?, OER, (A.5) 


n=—n 


where a, and a_, are not both zero. Note that if p(-) is a trigonometric polynomial, 
then p(O + 1) = p(@) for all 0 ER. 
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If g: I > C is integrable, and if p(-) is a trigonometric polynomial, then we define 
the convolution g x p at every 0 € R as 


(g+p)(0) = /; g(8) p(6 — 0) ad (A.6) 


2 [vo 9p (8 — 0) dd. (A.7) 


Lemma A.1.2 (Convolution with a Trigonometric Polynomial). The convolution 
of an integrable function g: 1— C with the trigonometric polynomial 


n 
Or S- ae (A.8) 
n=—n 
is the trigonometric polynomial 


61> S” O(n)ane*™, OER. (A.9) 


n=—n 


Proof. Denote the trigonometric polynomial in (A.8) by p(-). By swapping sum- 
mation and integration we obtain 


(g +p)(0) = i a(8) p(8 — 8) ad 


I 


= / g(0) S- a, e2™-) ay 


I = 


= S- [9 gg etre hd 6 
n=—nl 

= a iy ei2mnd [0) e i279 dd 
n=—n : 

= > ane?" gn), OER. 
n=—n 


Definition A.1.3 (Fejér’s Kernel). Fejér’s degree-n kernel k,, is the trigono- 
metric polynomial 


kay = Sc (1 = a enn (A.10a) 
n=—n 
n+1 if EZ, 
2 
= sin((n+1)70 2 (A.10b) 
a ( _ ) if 0 ER \ Z. 


The key properties of Fejér’s kernel are that it is nonnegative 


k,(0)>0, OER; (A.11a) 
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that it integrates over I to one 


[rm dé =1; (A.11b) 
I 
and that for every fixed 0 < 6 < 1/2 
lim ky(0) dd = 0. (A.11c) 
n—-00 5<|ol< 


Here (A.11a) follows from (A.10b); (A.11b) follows from (A.10a) by term-by-term 
integration over I; and (A.11c) follows from the inequality 


2 
(0) (sag) SS ISG. 


n+1 \sin 76 


which follows from (A.10b) by upper-bounding the numerator by 1 and by using 
the monotonicity of sin?(7) in |6| € [0, 1/2]. 


For an integrable function g: I — C, we define for every n € Nand #€R 
on(g,) = (g *kn)(8) (A.12) 


s (1 i) gm) er", (A.13) 


n=—n 


I 


where the second equality follows from (A.10a) and Lemma A.1.2. We also define 
forg:R—-Corg:I-C 


lglh,1 = | |9(0)| a0. (A.14) 


Finally, for every function h: 1— C and J € R we define the mapping hy: R — C 
as 
hy: O+> hp(6—V). (A.15) 


A.2 Reconstruction in £, 


Lemma A.2.1. If g: R — C is integrable over I and g(@ +1) = g(@) for every 
OER, then 


jim [\) — g(9 — 0)| do = 0. (A.16) 


Proof. This is easy to see if g is continuous, because in this case g is uniformly 
continuous. The general result follows from this case by picking a periodic con- 
tinuous function h that approximates g in the sense that |\g—hl|;, < €/2; by 
computing 


lg — Sally, 1 = |g -h+h— Salli 
= |g -—h+h—hy +hy — golly, 
< lg — bly, + [|b — boll, + [lho — golly, 


= |lg — hl at ||h — holy, + ||h— gili1 
Set ||h—hy|l,;; (A.17) 
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and by then applying the result to h, which is continuous. 


Theorem A.2.2 (Reconstruction in £,). If g: 1— C is integrable, then 


lim [\ — on(g, 9)|d6 = 0. (A.18) 


n—oCo 


Proof. Let gp be the periodic extension of g. Then for every 6 € (0, 1/2), 

[is — on(g,0)| dd = / gp(0) — [ot — 0) kn (0) dd 

= ‘: [klar — gp(6 — 8) ao| dd, 

=| Ne ken (8) (gp (8) — gn(0—v)) a0 00 (A.19) 
—6 5<|9\<4 


where the first equality follows from the definition of o,(g, 6) (A.12), and where the 
second equality follows from (A.11b). We now bound the two integrals in (A.19) 


separately: 
J 


dé 


6 
[/,¥0\(ae(@) ~ 9o(8— 0) a0 a 


6 
6 
< iy / _ka(0)|9(0) ~ ge (0 — 9) | a0-a0 (A.20) 
6 
=f [roo) ge(8) — gp(0 — 8) | d0.ad 
15 
= | t0(0) [\ge(0) — go(@ —»)| a0a0 
—6 I 


6 
< | bn(0) max { [lan(0) —an(0 — 020} ao 
< [n(0) mas { f\ge() - gn(0 - 0°) a0} av 
= max i lgv(8) — 9p(8 — 8)| a8, (A.21) 


where the first inequality follows from the Triangle Inequality for Integrals (Propo- 
sition 2.4.1) and the nonnegativity of k,,(-) (A.lla), and where the last equality 
follows because k,,(-) integrates to one (A.11b). 


The second integral in (A.19) is bounded as follows: 


‘al kn(0)|gp (9) — gp(8 — 9)| dv dé 


<|vl<$ 


J, bold) [lae(6)  ge(0 - 0) 00 a0 
5<|9|<$ I 


< mart [lar(@)—ae(@-o|d0h f kn(w)ao 
oer (Sy 5<|0|<4 


<2 hel, f bale) av. (2) 
5<|9\<4 
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From (A.19), (A.21), and (A.22) we obtain 
[is — on(g,9)| dd < max [ion — 8) — gp(8)| dé 
I |O|<6 Jy 
+2ilels f kn(d)dd. (A283) 
b<|v/<4 


Inequality (A.23) establishes the theorem as follows. For every € > 0 we can find 
by Lemma A.2.1 some 6 > 0 such that 


max iG — 0) — gp(0)| dd <e, (A.24) 


and keeping this 6 > 0 fixed we have by (A.11c) 


lim 2 lglli,; | kin (9) dd = 0. (A.25) 
ne2Co b<|d|<4 


It thus follows from (A.23), (A.24), and (A.25) that 


n—oco 


lim [is — on(g,9)| dd <e, (A.26) 


from which the theorem follows because € > 0 was arbitrary. 


From Theorem A.2.2 we obtain: 


Theorem A.2.3 (Uniqueness Theorem). Let gi,g2: 1— C be integrable. If 
ji(n) = Galn), n€Z, (A.27) 
then g, and go are equal except on a set of Lebesgue measure zero. 
Proof. Let g = gi — go. By (A.27) 
g(m) =0, neEZ, (A.28) 


and consequently, by (A.13), on(g,@) = 0 for every n € N and 6 € I. By Theo- 
rem A.2.2 


lim [is — on(g,0)| dd = 0, (A.29) 
n—- oo I 
which combines with (A.28) to establish that 


[ig(o)\ a0 =o. 


Thus, g is zero except on a set of Lebesgue measure zero (Proposition 2.5.3 (i)), 
and the result follows by recalling that g = g; — go. 


Theorem A.2.4 (Riemann-Lebesgue Lemma). /f g: I > C is integrable, then 
lim g(7) = 0. (A.30) 


|n|—00 
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Proof. Given any « > 0, let p be a degree-n trigonometric polynomial satisfying 


[19 -r@)|a8 << (A.31) 


(Such a trigonometric polynomial exists for some n € N by Theorem A.2.2). Ex- 
pressing g as (g — p) + p and using the linearity of the Fourier Series Coefficients 
we obtain for every integer 7 whose magnitude exceeds the degree n of p 


an)| = \(@— Pn) + 0(0)| 
= |@-p\(n)| 


< | |9(8) — p(8)| a9 
<eé 


(A.32) 


where the equality in the first line follows from the linearity of the Fourier Series 
Coefficient; the equality in the second line because |7| is larger than the degree n 
of p; the inequality in the third line because for every integrable h: I — C we have 


h(n)| = | h(0) e279 a 
< i} \n(0) ePrnt ao 

I 
= [\W()\08, ez: 

I 


and where the inequality in the last line of (A.32) follows from (A.31). 


A.3 Geometric Considerations 


Every square-integrable function that is zero outside the interval [—1/2, 1/2] is also 
integrable (Proposition 3.4.3). For such functions we can discuss the inner product 
and some of the related geometry. The main result is the following. 


Theorem A.3.1 (Complete Orthonormal System). The bi-infinite sequence of 
functions ...,@-1,0, h1,... defined for every n € Z by 


on(0) =e?" 1{oell, OER 


forms a complete orthonormal system for the subspace of Lg consisting of those 
energy-limited functions that are zero outside the interval I. 


Proof. The orthonormality follows by direct calculation 
Lae ei2nn'8 4g — {n=}, 7,7 €Z. (A.33) 
I 


To show completeness it suffices by Proposition 8.5.5 (ii) to show that a square- 
integrable function g: I— C that satisfies 


(g,¢,)=0, neEZ (A.34) 
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must be equal to the all-zero function except on a subset of I of Lebesgue measure 
zero. To show this, we note that 


(8;6n) = 9(n), 0 €Z, (A.35) 


so (A.34) is equivalent to 

9(n) =0, eZ, (A.36) 
and hence, by Theorem A.2.3, g must be zero except on a set of Lebesgue measure 
Zero. 


Recalling Definition 8.2.1 and Proposition 8.2.2 (d) we obtain that, because the 
functions ..., 6-1, $0, ¢1,... form a CONS and because (g, ¢,) = g(7), we have: 


Theorem A.3.2. Let g,h: 1— C be square integrable. Then 


[ig@l?ao= S~ lan)? (A.37) 
and = 
[90 h*(0)dd= S> g(n) h*(n). (A.38) 


There is nothing special about the interval I, and, indeed, by scaling we obtain: 


Theorem A.3.3. Let S be nonnegative. 


(i) The bi-infinite sequence of functions defined for every n © Z by 
Lo ioeyags S S 
een To Wn = R A. 
as ere ae (A.39) 


forms a CONS for the class of square-integrable functions that are zero out- 
side the interval [—S/2,S/2). 
(ii) If g is square integrable and zero outside the interval [—S/2,S/2), then 


aie 2 —_ ~ pie 1 —i2rns/S 
[108 dg = al g(G)qg enters ag 


n=—oco —S/2 


2 


(A.40) 


(iti) If g,h: R > C are square integrable and zero outside the interval [—S/2,S/2), 
then 


= . oa i —i2nns/S ( ame cae —i2nns/S ; 
a7 (f 95° dé | MOze dé) . 


n=—0o 
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Note A.3.4. The theorem continues to hold if we replace the half-open interval 
with the open interval (—S/2,S/2) or with the closed interval [—S/2, $/2], because 
the integrals are insensitive to these replacements. 


Note A.3.5. We refer to 
S/2 on Seon 
g E —_ e278 dé 
es Tg 


as the 7-th Fourier Series Coefficient of g with respect to the interval 
[-S/2, $/2). 


Lemma A.3.6 (A Mini Parseval Theorem). 


(i) If 
w a 
o(t)= fg fer ag, teR, (A.Al) 
—w 
where g: R— C satisfies 
WwW 
J lal BP af < 00, (A.42) 
—w 
then 
lee) W 
[bora / la af. (A.43) 
—oo —W 


(ii) If for bothy =1 andv =2 


WwW 
z(t) = i yh e@"tt df £ER, (A.44) 


where the functions g1,g2: RC satisfy 


WwW 
/ Ep ap ees PEL. (A.45) 
—Ww 
then 
love) W 
/ xi(t)a()at= [ anf) 95(f) df. (A.46) 
Las —wWw 


Proof. We first prove Part (i). We begin by expressing the energy in x in the form 


i 
g 
oo 
g 
8 
a 
Q 
| 
NM as 
=| 
eae 


=|[" Ss Ix(a- 7)| ao. (A.47) 
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where in the second equality we changed the integration variable to a = t+¢/(2W); 
and where the third equality follows from Fubini’s Theorem and the nonnegativity 
of the integrand. The proof of Part (i) will follow from (A.47) once we show that 
for every aE R 


Lh (« -awl = ow fia f\P af. (A.48) 


This can be shown by noting that by (A.41) 


rae a) = [ager aay 
J2W 2W _w /2W i 


so (2W)~1/2x(a — ¢/(2W)) is the &-th Fourier Series Coefficient of the mapping 
f + e?fo 9(f) with respect to the interval [-W, W) and consequently 


= [jer anf a 


WwW 5 
= / la(f)| af, 
—w 


where the first equality follows from Theorem A.3.3 (ii) and the second because 
the magnitude of e?*/% is one. 


2 


py awl? ow) 


To prove Part (ii) we note that by opening the square and then applying Part (i) 
to the function Gx, + x2 we obtain for every G € C 


ja? f |noPars [leon fat +2Re(a [ ax(e as(nae) 
=f \aestt) + eat)? at 
= [loons + ona 
= ia? J loinyPars f° latnlar+2re(o f nif)astas). 


Consequently, upon applying Part (i) to x; and to x2 we obtain 
fore) WwW 
Re(a f m(yex(nar) =Re(s [ wiNenar), Bec, 
as —Ww 


which implies 
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Corollary A.3.7. 


(i) Lety: R-C be of finite energy, and let T> 0 be arbitrary. Let 
T . 
H(t) = f ule at, fer. 
-T 
Then 
T oo 
[ Wwora= [anes 


(tt) Let the signals x1,X2: R— C be of finite energy, and let T> 0. Define 


aut) = fale? at (v=1,2, feR). 


oT 
Then 
T Co 
/ 1 (t) 23(t) dt = | nlf) 98(f) af. 


a me 
Proof. Part (i) follows from Lemma A.3.6 (i) by substituting T for W; by substi- 
tuting y for g; and by swapping the dummy variables f and t. Part (ii) follows 
analogously. 


A.4_ Pointwise Reconstruction 


If g: I — C is periodically continuous, then we can reconstruct its value at every 
point from its Fourier Series Coefficients: 


Theorem A.4.1 (Reconstructing Periodically Continuous Functions). Let the 
function g: 1— C be periodically continuous. Then 


lim max{|9(@) — on(g,4)|} = 0. (A.49) 


noo del 


Proof. Let gp denote the periodic extension of g. Then for every 6 € I, 
9(9) — on(g, 9) = 9(9) — [oO Ip(O — 0) ad 
I 
= i: kn (0) (gp (8) — gp(8 — 8)) dd, (A.50) 
I 


where the first equality follows from the definition of o,,(g, @) (A.12) and the second 
from (A.11b). Consequently, for every 0 € I, 


199) —on(g, 0)| 
< [ro O)lon(0) ~ gp(0—¥)| ad 


I 
6 
1 
= +f kn(8)|gp(9) — ge(8—-B)|d0, O<5<=. (AS1) 
-6 Js<js|<d 2 
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We next treat the two integrals separately. For the first we have for every 6 € I 
and every 0 <6 < 1/2, 


6 6 
J Fol0)|g90(0) — 900 - 8) 40. max {|9°(0) — 90 —9'))} f kld) a0 
-6 |9’"|<d 6 


a max { |gp(6) — gp(9— ¥)|f, (A.52) 


where the first inequality follows from the the nonnegativity of k,(-) (A.1la), and 
where the second inequality follows because k,,(-) is nonnegative and integrates 
over I to one (A.11b). For the second integral in (A.51) we have for every #0 € I 
and every 0 < 6 < 1/2, 


J, bol) |g0(0) 90-9) A <2 max {Ig} fbn) dd, (A.53) 
b<|9|<4 € 6<|of<4 


where the maximum on the RHS is finite because g is periodically continuous. 
Combining (A.51), (A.52), and (A.53) we obtain for every 0 < 6 < 1/2 


max{|9(9) — on(g,8)]} 


/ 
< max max {|9p (6) — gp(9 —9)|} + 2 max{|9(6’)|} a kn (8) dd. (A.54) 


Because g(-) is periodically continuous it follows that its periodic extension gp is 
uniformly continuous. Consequently, for every « > 0 we can find some 6 > 0 such 
that 

max |gp(8) —gp(9-¥)|<e, OEL (A.55) 


By letting n tend to infinity in (A.54) we obtain from (A.11c) and (A.55) 


lim max{|9(@) — on(g,0)|} <e, 


noo del 


which establishes the result because € > 0 was arbitrary. 


As a corollary we obtain: 


Corollary A.4.2 (Weierstrass’s Approximation Theorem). Every periodically con- 
tinuous function from I to C can be approximated uniformly using trigonometric 
polynomials. 
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Theorems Referenced by Name 


Bernstein’s Inequality 
Bochner’s Theorem 
Cauchy-Schwarz Inequality 


Theorem 6.7.1 
Theorem 25.8.1 
Theorem 3.3.1 


Cauchy-Schwarz Inequality for Random Variables Theorem 3.5.1 
Characterization of Shift-Orthonormal Pulses Corollary 11.3.4 

Covariance Inequality Corollary 3.5.2 

Dominated Convergence Theorem (Rudin, 1974, Theorem 1.34) 


Factorization Theorem 

Fubini’s Theorem 

Holder’s Inequality 

Kolmogorov’s Existence Theorem 
£e-Sampling Theorem 

Minimum Bandwidth Theorem 

Nyquist’s Criterion 

Parseval’s Theorem 

Pointwise Sampling Theorem 
Pythagorean Theorem 

Riesz-Fischer Theorem 

Sandwich Theorem 

Triangle Inequality for Complex Numbers 
Triangle Inequality in Lg 
Union-of-Events Bound (or Union Bound) 
Wiener-Khinchin Theorem 
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Theorem 22.3.1 
See Section 2.6 
Theorem 3.3.2 
Theorem 25.2.1 
Theorem 8.4.3 
Corollary 11.3.5 
Theorem 11.3.2 
Theorem 6.2.9 
Theorem 8.4.5 
Theorem 4.5.2 
Theorem 8.5.3 
Chapter 8, Footnote 5 
(2.11) and (2.12) 
(4.12) and (4.14) 
Theorem 21.5.1 
Theorem 25.14.1 


Abbreviations 


Abbreviations in Mathematics 


CDF 
CONS 
CRV 
CSP 
FDD 
FT 
IFT 
IID 
LHS 
MGF 
PDF 
PMF 
PSD 
RHS 
RV 
SP 
WSS 


Cumulative Distribution Function 
Complete Orthonormal System 
Complex Random Variable 
Complex Stochastic Process 
Finite-Dimensional Distribution 
Fourier Transform 

Inverse Fourier Transform 
Independent and Identically Distributed 
Left-Hand Side 

Moment Generating Function 
Probability Density Function 
Probability Mass Function 

Power Spectral Density 
Right-Hand Side 

Random Variable 

Stochastic Process 

Wide-Sense Stationary 


Abbreviations in Communications 


BER 
BPF 
LPF 
M-PSK 
PAM 
PSK 
QAM 
QPSK 


Bit Error Rate 

Bandpass Filter 

Lowpass Filter 

M-ary Phase Shift Keying 

Pulse Amplitude Modulation 
Phase Shift Keying 

Quadrature Amplitude Modulation 
Quadrature Phase Keying 
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List of Symbols 


General 


A=>B 
ASB 


mM 


(I> 


Specific Sets 


AANZ 


Statement B is true whenever Statement A is true. 
Statement A is true if, and only if, Statement B is true. 
Summation. 

Product. 

Equal by definition. 

End of proof. 


Empty set. 
The set of all objects described before the colon that satisfy 
the condition stated after the colon. 
Number of elements of the set A. 
Set membership: a is an element of A. 
Exclusion: a is not an element of A. 
Proper subset: every element of A is an element of B but some 
elements of B are not elements of A. 
Subset: every element of A is also an element of B. 
Setminus: {b € B:b ¢ A}. 
Set-complement. 
Symmetric Set Difference: (A \ B) U (B\ A). 
Cartesian product: {(a,b):a€A,b€ B}. 
n-fold Cartesian product: Ax Ax---x A. 
n times 
Intersection: {€ € A: € € B}. 
Union: elements of A or B. 


Natural Numbers: {1,2,...}. 
Integers: {...,—2,—1,0,1,2,...}. 
Real Numbers. 

Complex Numbers. 
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List of Symbols 705 


F, Binary Field (Section 29.2). 
I Unit interval [—1/2,1/2); see (A.1). 


Intervals and Some Functions 


pe a ee Inequality signs. 

+00, —00, oo Infinities. 

a, b] Closed Interval: {€ € R:a< €& < db}. 

a, b) Interval open on the right: {€ € R:a< €& < bd}. 
(a, Interval open on the left: {€ € R:a< & < bd}. 
(a, b) Open interval: {£ € R: a< € < d}. 

0, co] Nonnegative reals including infinity: {€ € R: € > 0} U {oo}. 
LE | Floor: the largest integer not larger than €. 

[E] Ceiling: the smallest integer not smaller than €. 
max Maximum. 

min Minimum. 

sup Least upper bound. 

inf Greatest lower bound. 


Complex Numbers 


C Complex field. 
(fd, 
Re(z) Real part of z. 
Im(-) Imaginary part of z. 
|z| Modulus of z. 
. Complex conjugate of z. 
D(z0,1) Open dise: {z € C: |z— z0| <r}. 
Limits 
An a Convergence: the sequence a1, da2,... converges to a. 
limn—sco An Limit: the limit of a, as n tends to infinity. 
> Converges to. 
limp_—soo Gn Upper limit (limit superior). 
lim, 45 Gn Lower limit (limit inferior). 


Defining and Operating on Functions 


g:D-R Function of name g, domain D, and range R. 

gitrht? Function of name g mapping ¢ to t?. (Domain & range un- 
specified.) 

goh Composition: € > g(h(€)). 

d Differentiation operator. 

oe Ge) Partial derivative of g(-) with respect to 2”), 
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Og(s) Jacobian matrix. 

ie Integral over the region D. 

FG) | pds The evaluation of the function € + g(&) at a. 

g(é)|° The evaluation g(b) — g(a). 


Function Norms, Relations, and Equivalence Classes 


IIxll, 
IIx|| » 
W<Ilp,4 
X=y 


[ul 


See (2.6). 

See (3.12). 

See (A.14). 

x and y are indistinguishable; see Definition 2.5.2. 
The equivalence class of x; see (4.60). 


Function Spaces 


Ly 
Lo 


Le 


Integrable functions from R to C or R to R (depending on 
context); see Sections 2.2 and 2.3. 

Square-integrable functions from R to C or R to R (depending 
on context); see Section 3.1. 

Collection of equivalence classes of square-integrable functions; 
see Section 4.7. 


Special Functions 


I{statement } 


0 
nl 


Indicator function. Its value is 1 if the statement is true and 0 
otherwise. 

All-zero function: t + 0. 

n factorial: 1 x 2x +--+ xn. 

Number of subsets of {1,...,n} containing k (distinct) ele- 
ments (= n!/(k(n — k)!)). 

Nonnegative square root of €. 

Cosine function (argument in radians). 

Sine function (argument in radians). 

Sinc function; see (5.20). 

Inverse tangent. 

Q-function; see (19.9). 

Gamma function; see (19.39). 

The zeroth-order modified Bessel function; see (27.47). 
Natural logarithm (base e). 

Exponential function: exp() = e&. 

element of [—7,7) that differs from € by an integer multiple 
of 27. 
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Operations on Signals 


PAM Signaling 


gor@ 
Ts 
1/Ts 


The mirror image of x; see (5.1). 

The Fourier Transform of the signal x; see (6.1). 

Inverse Fourier Transform of x; see (6.4). 

Inner product between the signals x and y; see (3.1) and (3.4). 
Convolution of x with y; see (5.2). 

The signal t +> x(t) + y(t). 

The scaling of the signal x by complex or real number a, i.e., 
the signal t > ax(t). 

Self-similarity function of signal x. 

The 7-th Fourier Series Coefficient; see (A.2). 


Frequency response of a unit-gain lowpass filter of cutoff fre- 
quency W,. That is, LPFw.(f) = I{|f| < We}. 

Impulse response of a unit-gain lowpass filter of cutoff fre- 
quency W,. That is, LPF w,(¢) = 2W. sinc(2W.t). 

Frequency response of a unit-gain bandpass filter of band- 
width W around the carrier frequency f.. That is, the mapping 
of f to I{||f| — fe] < W/2}. It is assumed that f, > W/2. 
Impulse response of a unit-gain bandpass filter of band- 
width W around the carrier frequency f,. That is, the mapping 
of t to 2Wcos(27 fet) sinc(W#). It is assumed that f, > W/2. 


Pulse shape; see Section 10.7. 

Baud period; see Section 10.7. 

Baud rate. 

Constellation; see Section 10.8. 

Minimum distance of a constellation; see Section 10.8. 

Block encoder; see Definition 10.4.1 and (18.3). 

Transmitted signal at time t when the data are d; see (28.6). 
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QAM Signaling 


gor@ 
Ts 
1/Ts 


Matrices 


n Xm matrix 
0 

In 
PAGE) 
A* 

Al 

At 
tr(A) 
det(A) 
Re(A) 
Im(A) 
A~=0 
A> 0 


Vectors 


R” 
(ou 

0 

a 

al 

lal 

(a, b)5 
dp(a, b) 


Linear Algebra 


span(vj,.--,Vn) 


Dim(V) 
Ker(T) 
Image(T) 


Pulse shape; see Sections 16.3 & 16.5. 

Baud period; see Section 16.3. 

Baud rate. 

Constellation; see Section 16.7. 

Minimum distance of a constellation; see Section 16.7. 

Block encoder; see (18.3). 

The transmitted signal at time ¢ when the data are d; see 
(28.31). 


A matrix with n rows and m columns. 
The all-zero matrix. 

The n x n identity matrix. 

The Row-k Column-f component of the matrix A. 
Componentwise complex conjugate. 
Transpose of A. 

Hermitian conjugate of A. 

Trace of A. 

Determinant of A. 

Componentwise real part of A. 
Componentwise imaginary part of A. 
A is a positive semidefinite matrix. 

A is a positive definite matrix. 


Set of column vectors of n real components. 

Set of column vectors of n complex components. 
The all-zero vector. 

The j-component of the column vector a. 

The transpose of the vector a. 

Euclidean norm of a; see (20.85). 

Euclidean inner product; see (20.84). 

Euclidean distance between a and b, i-e., ||a — b]]. 


Linear subspace spanned by the n-tuple (vi,...,Vn); see (4.8). 
Dimension of the subspace V. 
Kernel of the linear mapping T(-). 


Image of the linear mapping T(-). 
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Probability 


{Xk} 
X ~ Distribution 


2 
Xn,r 


Bernoulli(p) 


U (A) 
Nc(0, kK) 


N (pm, K) 


Probability triplet; see Page 3. 

Probability Mass Function (PMF) of X. 

Joint PMF of (X,Y). 

Conditional PMF of X given Y. 

Probability density function of X. 

Joint PDF of (X,Y). 

Conditional PDF of X given Y. 

Conditional PDF of X given the event A. 
Cumulative distribution function of X. 
Characteristic function of X. 

Moment generating function of X; see (19.23). 
Expectation of X; see (17.9). 

Variance of X; see (17.14a). 

Covariance between X and Y; see (17.17). 
Conditional expectation. 

Probability of an event. 

Conditional probability of an event. 
Probability that a RV satisfies some condition. 
Conditional version of Pr[-]. 

Equal in law. 
X and Z are conditionally independent given Y. 
Sequence of random variables ..., X_1, Xo, Xq,..- 

X has the specified distribution. 

Noncentral y? distribution with n degrees of freedom 
and noncentrality parameter 4X. 

Bernoulli distribution (takes on the values 0 and 1 with 
probabilities p and 1 — p). 

Uniform distribution over the set A. 

Multivariate proper complex Gaussian distribution of 
covariance K; see Note 24.3.13. 

Multivariate real Gaussian distribution of mean pw and 
covariance K. 


Stochastic Processes 


X(n)), (Xn, n € Z) 
X(t)), (X(t), t€ R) 


Kxx 
Sxx 
pxx() 


Discrete-time stochastic process. 
Continuous-time stochastic process. 
Autocovariance function. 

Power spectral density (PSD). 
Correlation function. 
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List of Symbols 


Hypothesis Testing 


Guess 

PGuess 

OMAP 

OML 

p* (error) 

pm ap(error|-) 


The subset of R@ defined in (21.33). 

RV to be guessed in binary hypothesis testing. 
Log likelihood-ratio function; see (20.41). 
Likelihood-ratio function; see (20.38). 

Number of hypotheses in multi-hypothesis testing. 
Set of hypotheses {1,...,M}. 

RV to be guessed in multi-hypothesis testing. 
Generic guessing rule; see Sections 20.2 & 21.2. 
Generic optimal guessing rule. 

MAP Decision Rule. 

Maximum-Likelihood Rule. 

Optimal probability of error. 

Conditional probability of error of MAP rule. 


The Binary Field and Binary Tuples 


OS Ae 


Binary field {0,1}. 

The set of binary «-tuples. 

Addition in F2; see (29.3). 
Multiplication in F2; see (29.4). 
Hamming distance; see Section 29.2.4. 
Hamming weight; see Section 29.2.4. 
Antipodal mappings (29.14) and (29.17). 


Binary N-tuples whose «-th component is zero; see (29.61). 
Binary N-tuples whose «-th component is one; see (29.64). 
Generic element of Image(T). 

Minimum Hamming distance; see (29.54). 

Encoder. 

Optimal probability of error in guessing the «-th data bit. 

Conditional probability of error of the MAP rule designed 
to minimize block errors. 

See (29.77). 

Generic element of Image(enc). 

The 7-th symbol in the N-tuple enc(d). 


Index 


A 
absolute value, 2, 16 
affine transformation 
of a multivariate Gaussian, 473 
of a scalar, 341 
of a univariate Gaussian, 341 
of a vector, 473 
all-zero 
function, 3 
matrix, 456 
signal, 3 
all-zero codeword assumption, 675-680 
almost sure convergence 
of random variables, 356 
of random vectors, 487 
amplification, 3, 27 
analytic continuation, 350, 351n 
analytic function, 60 
analytic representation, 109, 135 
of an energy-limited signal, 135 
characterization, 135 
definition, 135 
of an integrable signal, 109-116 
characterization, 110 
definition, 110 
inner products, 114 
recovering from, 113 
analytic signal, see analytic representation 
antipodal mapping, 653, 656 
argument, 65n 
Arithmetic-Geometric Inequality, 421 
assuming the all-zero codeword, 675-680 
autocorrelation function, 211, see also self- 
similarity function 
autocovariance function 
of a continuous-time SP, 517 
of a discrete-time CSP, 300 
of a discrete-time SP, 211 
average probability of a bit error, 637 


B 
Bolcskei, Helmut, xxiv 
band-edge symmetry, 193 
bandlimited stochastic process, 252 
bandpass filter, 61, see also ideal unit-gain 
bandpass filter 
bandwidth, 680 
around a carrier, 101, 104 
of a product, 90-92 
of a stochastic process, 252 
of baseband representation 
energy-limited signal, 137 
integrable signal, 122 
of energy-limited signal, 81 
of integrable signal, 89 
Barker code, 264 
baseband representation, 101, 
136, 162 
FT of, 117 
inner product, 125, 137, 276-278 
of convolution, 126, 137 
of energy-limited signal, 136-139 
characterization, 136, 137 
definition, 136 
inner product, 137 
properties, 138 
recovering from, 137 
sampling of, see complex sampling 
of filter’s output, 128, 137 
of integrable signal, 116-129 
characterization, 120, 123 
definition, 116 
FT of, 117 
inner product, 126 
recovering from, 123 
of QAM, 267 
sampling of, see complex sampling 
basis, 29, 144, 144n 
baud period, 680 
in PAM, 177 
in QAM, 268 


109, 116, 
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Index 


baud rate 
in PAM, 177 
in QAM, 268 
Baudot, J.M.E, 177n 
BER, 637 
Bernoulli, 442 
Bernstein’s Inequality, 92-93, 614 
Bessel function, 355, 624 
Bhattacharyya Bound, 373, 419-421 
bi-infinite block-mode 
with PAM 
definition, 229 
operational PSD, 255 
power, 229 
with QAM 
operational PSD, 318 
power, 313 
bi-orthogonal code, 684 
bi-orthogonal keying, 596-599 
BIBO stable, see stable 
Biglieri, Ezio, xxiv 
binary field, 654 
binary hypothesis testing, 360-403 
binary-input /output-symmetric, 676 
binary-to-complex block encoder, 308 
binary-to-reals block encoder, 173, 229 
Binomial Expansion, 592 
bit error, 654 
bit error rate, 637, 674 
bit rate, 680 
block error, 654 
block-encoder 
binary-to-complex 
definition, 308 
rate, 308 
binary-to-reals 
definition, 173 
rate, 173 
block-mode, 172-174, 313 
blocklength, 657 
Boche, Holger, xxiv 
Bochner’s Theorem, 526 
Bonferroni Inequalities, 429 
Boole’s Inequality, 414n 
Borgmann, Moritz, xxiv 
bounded-input/bounded-output stable, see 
stable 
Boyd, Stephen, xxiv 
Brandle, Marion, xxiv 
Braendle, Samuel, xxiv 
Brickwall function, 75 
FT of, 67, 75-76 


IFT of, 67, 75-76 
Bross, Shraga, xxiv 


C 
C1 
Cantor set, 8n 
carrier frequency, 103, 161, 161n 
Cauchy-Riemann equations, 291 
Cauchy-Schwarz Inequality, 18-22 
for d-tuples, 25 
for random variables, 23 
for sequences, 25 
causal filter, 58 
causality, 182 
centered complex Gaussian 
random variable, 500 
random vector, 504 
centered Gaussian 
random variable, 341 
random vector, 454 
centered stochastic process, 203 
central chi-square distribution, 352-356 
Central Limit Theorem, 339 
change of variable 
complex random variable, 291 
complex vector, 296, 305 
real vector, 290 
characteristic function 
of a central x”, 353 
of a complex random variable, 289 
of a complex random vector, 295 
of a pair of real random variables, 289 
of a real Gaussian RV, 351 
of a real Gaussian vector, 475 
of a real random variable, 350 
of a real random vector, 468-469 
of a squared Gaussian, 352 
charge density, 245 
circular symmetry, 494-511 
of a complex Gaussian, 502 
of a complex Gaussian vector, 507-509 
of a complex random vector 
and linear functionals, 503 
and linear transformations, 503 
and properness, 504 
definition, 502 
of a CRV 
and expectation, 495 
and properness, 499 
characterization, 498 
definition, 495 
clock, 613 
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closed subspace, 144n, 152 
code property, 659 
Coding Theory, 653 
coherent decoder, 628 
colored noise, 599-604 
compact support 
function of, 330 
complete (space), 71, 152 
complete orthonormal system, see CONS 
complex conjugate 
of a matrix, 284 
of a scalar, 15 
complex dimensions per second, 266, 271 
complex Gaussian 
random variable, 499-502, 511 
centered, 500 
circularly-symmetric, 502 
definition, 500 
proper, 500 
random vector, 504 
and linear transformations, 505 
centered, 504 
characterization, 505 
circularly-symmetric, 507 
definition, 504 
proper, 505, 507-509 
complex magnitude, see absolute value 
complex modulus, see absolute value 
complex positive semidefinite matrix, 304, 
507 
complex random variable, see CRV 
complex random vector 
characteristic function, 295 
circularly-symmetric, see circular sym- 
metry 
covariance matrix, 293 
definition, 292 
expectation, 293 
finite variance, 293 
proper, 293-295 
transforming, 296, 305 
complex sampling, 122, 162-163 
reconstruction from, 163-166 
complex signal, 3 
complex stochastic process, see CSP 
complex symbols per second, 266 
complex-valued signal, 3 
componentwise antipodal mapping, 656 
composite hypothesis testing, 430n, 614 
composition (of functions), 2 
conditional 
distribution, 363-364, 483 


independence, 379 
probability, 406 
conjugate (of a matrix), 284 
conjugate-symmetric, 65, 108 
conjugate-symmetric matrix, 284 
CONS, 143-159 
characterization, 145 
definition, 144 
for closed subspaces, 155 
for energy-limited signals that are 
bandlimited to W Hz, 148, 149 
for energy-limited signals that vanish 
outside an interval, 147 
Prolate Spheroidal Wave Functions, 
157 
consistency property (of FDDs), 513 
constellation 
M-PSK, 274 
of PAM, 177-181 
definition, 177 
minimum distance, 178 
normalization, 178 
number of points, 178 
second moment, 178 
of QAM, 274 
definition, 274 
minimum distance, 274 
number of points, 274 
second moment, 274 
QPSK, 274 
square 4-QAM, 274 
convergence of random variables 
almost surely, 356 
in distribution, 357 
in mean square, 356 
in probability, 356 
with probability one, 356 
convergence of random vectors 
almost surely, 487 
in distribution, 488 
in mean square, 487 
in probability, 487 
with probability one, 487 
convolution, 53-63, 68, 139 
baseband representation of, 126, 137 
between real and complex signals, 121 
FT of, 77 
limits of, 327 
uniformly continuous, 55 
correlation coefficient, 23 
covariance 


between two CRVs, 288-289 
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between two RVs, 23 
Covariance Inequality, 23 
covariance matrix 
and positive semidefinite matrices, 471 
of a complex random vector, 293 
of a real random vector, 464 
singular, 466-468 
covariance stationary, see WSS 
Craig’s formula, 345 
CRV, 283-306 
argument, 290 
characteristic function, 289 
circularly-symmetric, see circular sym- 
metry 
covariance, 288-289 
definition, 285 
density, 286 
distribution, 285 
expectation, 283, 286 
magnitude, 290 
proper, 287-288 
transforming, 289, 291 
variance, 283, 287 
cryptography, 496 
CSP 
centered, 297 
continuous time 
measurable, 315n 
operational PSD, 315 
definition, 297 
discrete-time, 297-306 
autocovariance function, 300 
covariance stationary, see WSS 
proper, 298 
PSD, 300 
second-order stationary, see WSS 
spectral distribution function, 303 
stationary, see stationary 
strict-sense stationary, see station- 
ary 
strongly stationary, see stationary 
weakly stationary, see WSS 
wide-sense stationary, see WSS 
WSS, see WSS 
finite variance, 297 
cumulative distribution function, 343 
cyclostationary, 245n 


D 
de Caen’s Inequality, 429 
decision rule, see guessing rule 
decoding rule, see guessing rule 


degree-n trigonometric polynomial, 686 
degrees of freedom 
of a central x? distribution, 353 
of a noncentral y? distribution, 354 
of a signal, 98 
delay 
in PAM, 181 
introduced by channel, 613 
Dembo, Amir, xxiv 
dense subset of £1, 330 
detection in white Gaussian noise, 562-612 
M-PSK, 588-590 
antipodal signaling, 586-587 
bi-orthogonal keying, 596-599 
binary signaling, 586-588 
in passband, 584-585 
optimal decision rule, 572-576 
orthogonal keying, 590-593 
probability of error, 576-577 
signals of infinite bandwidth, 604-605 
simplex, 593-596 
sufficient statistics, 567-572 
differentiable complex function, 290 
digital implementation, 182 
dimension, 30, 657 
Dirac’s Delta, 3 
discrete-time single-block model, 642 
distance spectrum, see weight enumerator 
domain, 2 
Dominated Convergence Theorem, 702 
dual code, 683 
duality, 151 
Durisi, Giuseppe, xxiv 
dynamic range, 582 


E 
eigenvalue, 459 
eigenvector, 459 
encoder property, 659 
energy 
in baseband and passband, 126, 138 
in PAM, 220-223 
in QAM, 307-310 
of a complex signal, 16 
of a real signal, 14 
energy per bit 
in PAM, 222 
in QAM, 310 
energy per complex symbol 
in PAM, 337 
in QAM, 310 
energy per symbol 


Index 715 
in PAM, 222 baseband representation of output, 
in QAM, 310 128, 137 
energy-limited passband signal, see pass- causal, 58 
band signal front-end, see front-end filter 
energy-limited signal, 16 stable, 58 


that is bandlimited, 47, 79-87 
bandwidth of, 81 
continuity of, 84 
definition, 80 
of zero energy, 80 
through a stable filter, 
through an ideal unit-gain LPF, 85 
entire function, 60, 93-96 
of exponential type, 96 
Ephraim, Yariv, xxiv 
equal law 
complex random variables, 285 
complex random vectors, 292, 295 
random variables, 208 
random vectors, 208 
equalization, 649 
equivalence class, 49-50, 70 
equivalence relation, 48 
essential supremum, 50n 
estimation 
and conditional expectation, 486 
jointly Gaussian vectors, 486 
Estimation Theory, 486 
Euler’s Identity, 121 
event, 3, 201 
excess bandwidth, 193, 196, 271, 680 
exclusive-or, 448, 565n, 654 
expectation 
of a complex random vector, 293 
of a CRV, 286 
of a random matrix, 464 
of a random vector, 463 
expected energy, 221 
experiment outcome, 3, 201 
exponential distribution, 353 


F 
Fo, 654 
Factorization Theorem, 433-435 
FDD, 204, 512-515 
consistency property, 513 
of a continuous-time Gaussian SP, 515 
symmetry property, 513 
Fejér’s kernel, 687 
field, 654 
filter, 58-61 


whitening, see whitening filter 
finite-dimensional distribution, see FDD 
finite-variance 
complex random vector, 293 
complex stochastic process, 297 
continuous-time real SP, 512 
random vector, 464 
Fisher, R. A., 451 
Forney, David Jr., xxiv 
Fourier Series, 147-148, 686-696 
CONS, 691 
pointwise reconstruction, 695 
reconstruction in £L;, 688 
Fourier Series Coefficient, 148, 686, 693 
Fourier Transform, 64—100 
boundedness, 73 
conjugate-symmetric, 65, 101, 108-109 
continuity, 73 
definition 
for elements of D2, 71 
for signals in £,, 64 
sinc(-), 70, 76 
a product, 90 
baseband representation, 117 
convolution, 77 
real signals, 65 
symmetric signals, 65 
the Brickwall function, 67 
preserves inner products, 65, 67-69 
properties, 67 
reconstructing from, 74 
reconstructing using IFT, 74, 75 
frequency response, 77 
of ideal unit-gain BPF, 79 
of ideal unit-gain LPF, 78 
with respect to a band, 129 
front-end filter, 582-584 
FT, see Fourier Transform 
Fubini’s Theorem, 10, 11, 69 
function, 14 
all-zero, 3 
domain, 2, 14 
energy-limited, 15, 16 
image, 2 
injective, 172 
integrable, 5 
Lebesgue measurable, 4 
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notation, 2 
one-to-one, 172 
onto, 2 

range, 2, 14 
surjective, 2 


G 
Gallager, Robert, xxiv 
Galois Field, 654 
Gamma function, 353 
Gaussian 
complex random vector, see complex 
Gaussian 
continuous-time SP, 515-516, 518-520, 
524, 537-552, 554-558 
definition, 515 
FDDs, 515 
filtering, 546, 552 
linear functionals, 537-545 
PSD, 524 
stationary, 518 
white, xix, 554-558 
CRV, see complex Gaussian 
random variable, 341 
and affine transformations, 341 
characteristic function, 351 
convergence, 356-357 
density, 342 
MGF, 349 
standard, see standard Gaussian 
random vector, 455 
a canonical representation, 478-481 
and affine transformations, 473 
and pairwise independence, 477 
centered, 454 
characteristic function, 475 
convergence, 487-488 
density, 481-482 
linear functionals of, 482-483 
moments, 486 
standard, see standard Gaussian 
generalized Rayleigh distribution, 354 
generalized Rice distribution, 355 
generator matrix, 659 
GF(2) 
addition, 654 
multiplication, 654 
Gram-Schmidt procedure, 44—48 
guessing rule 
definition, 361, 405 
MAP, see MAP 
maximum a posteriori, see MAP 


maximum likelihood, see ML 
ML, see ML 

optimal, 362, 405 

probability of error, 362, 405 
randomized, 368-370, 408 

with random parameter, 396-398 


H 
Ho6sli, Daniel, xxiv 
Hadamard code, 684 
half-normal, 351n 
Hamming and Euclidean distance, 657 
Hamming code, 683 
Hamming distance, 656 
Hamming weight, 656 
hard decisions, 681 
Hellinger distance, 403 
Herglotz’s Theorem, 217 
Hermite functions, 99 
Hermitian conjugate, 284 
Hermitian matrix, 284 
Hilbert Transform, 139 
Hilbert Transform kernel, 140 
Ho, Minnie, xxiv 
holomorphic function, see analytic function 
hypothesis testing 
M-ary, see multi-hypothesis testing 
binary, see binary hypothesis testing 


I 
I{-},1 
ideal unit-gain bandpass filter, 61, 79, 103 
requency response, 61, 79 
impulse response, 61 
is not causal, 61 
is unstable, 61 
ideal unit-gain lowpass filter, 60 
cutoff frequency, 60 
requency response, 60, 78 
impulse response, 60 
is not causal, 60 
is unstable, 60 
IID random bits, 229 
image 
of a linear transformation, 655 
of a mapping, 2 
impulse response, 58 
in-phase component, 121, 122, 137 
of energy-limited signal, 137 
of integrable signal, 122 
independent random variables, 378, 476 
independent stochastic processes, 515 
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indistinguishable, 48 
infinite divisibility, 359 
injective, 172 
inner product, 14-25 
and baseband representation, 126 
and QAM, 275-280 
and the analytic representation, 114 
and the baseband representation, 125, 
137, 276-278 
between complex signals, 15 
between real signals, 14 
between tuples, 392n 
properties, 16 
integrable 
complex functions, 17 
complex signal, 5 
passband signal, see passband signal 
integrable signal 
definition, 5 
that is bandlimited, 87-89 
bandwidth of, 89 
through a stable filter, 90 
integral 
of a complex signal, 5-6 
definition, 5 
properties, 6 
of a real signal, 4 
inter-symbol interference, xix, 649 
Inverse Fourier Transform, 65 
definition, 66 
of symmetric signals, 65 
of the Brickwall function, 67 
properties, 66 
irrelevant data, 447-449 
and random parameters, 450 
isomorphism, 156, 166 
It6 Calculus, 605 


J 
joint distribution function, 513n 
jointly Gaussian random vectors, 483-486 
and estimation, 486 


kK 
kernel, 655 
Kim, Young-Han, xxiv 
Koch, Tobias, xxiv 
Koksal, Emre, xxiv 
Kolmogorov’s Existence Theorem, 513 
Kolmogorov, A. N., 363, 451 
Kontoyiannis, Ioannis, xxiv 


L 
L£1,5 
£1-Fourier Transform, 64 
Lo, 15, 26-51, 70 
Lg, 43, 48-50, 70 
L2-Fourier Transform, 70-73 
definition, 71 
properties, 71 
£Le-Sampling Theorem, 151, 162, 164 
for passband signals, 165 
Laneman, Nicholas, xxiv 
Lapidoth, Danielle, xxv 
Laplace Transform, 349 
Lebesgue integral, 4 
Lebesgue measurable 
complex signal, 5 
real signal, 4 
Lebesgue null set, see set of Lebesgue mea- 
sure zero 
length of a vector, 30 
likelihood-ratio function, 371 
linear (K, N) F2 code, 657 
linear (K, N) F2 encoder, 657 
linear binary code with antipodal signaling, 
653-682 
definition, 660 
minimizing block errors, 666-671 
max-correlation decision rule, 667 
optimal decoding, 666 
probability of a block error, 668-671 
power, 661, 664 
PSD, 664 
linear binary encoder with antipodal signal- 
ing 
definition, 659 
minimizing bit errors, 671-675 
optimal decoding, 671 
probability of a bit error, 672-675 
linear combination, 28 
linear functionals 
of a Gaussian SP, 537-545 
of a SP, 530-545 
on F3, 661 
on R” 
definition, 482 
of Gaussian vectors, 483 
linear mapping, 655 
linear modulation, 174, 177 
linear subspace, see subspace 
linearly independent, 29 
LLR(-), 372 
Loeliger, Hans-Andrea, xxiv 
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log likelihood-ratio function, 372 

look-up table, 182 

low-density parity-check codes, 682 

lowpass filter, 60, see also ideal unit-gain 
lowpass filter 

LR(-), 371 


M 
magnitude, see absolute value 
MAP, 370-372, 408-409 
mapping, see function 
Markov chain, 379, 439-440 
mass density, 246 
mass line density, 247 
Massey, James, xxiv, 186 
matched filter, 58-60, 175, 176 
and inner products, 59-60 
definition, 59 
matrix 
conjugate-symmetric, 284 
conjugation, 284 
Hermitian, 284 
Hermitian conjugate, 284 
orthogonal, 458 
positive definite, 461 
positive semidefinite 
complex, 304, 507 
real, 461 
self-adjoint, 284 
symmetric, 284 
Toeplitz, 304 
transpose, 284 
matrix representation of an encoder, 658 
maximum a posteriori, see MAP 
maximum distance separable (MDS), 685 
maximum likelihood, see ML 
maximum-correlation rule, 424, 574, 575 
measurable 
complex signal, 5 
complex stochastic process, 315n 
real signal, 4 
stochastic process, 238, 529 
memoryless 
binary-input /output-symmetric, 676 
property of the exponential, 205 
message error, 637, 654 
MGF, 349 
definition, 349 
of a central chi-square, 353 
of a Gaussian, 349 
of a noncentral chi-square, 354 
of a squared Gaussian, 352 


of the sum of independent RVs, 354 
Miliou, Natalia, xxiv 
minimum bandwidth, 192 
minimum Hamming distance, 670 
Minimum Shift Keying, 608 
mirror image, 3, 53, 66 
Mittelholzer, Thomas, xxiv 
ML, 372-373, 408-409 
mod 2 addition, 654 
modified zeroth-order Bessel function, 355, 
624 
modulation, 169 
modulator, 169 
modulus, see absolute value 
moment generating function, see MGF 
monotone likelihood ratio, 355, 491 
Morgenshtern, Veniamin, xxiv 
Moser, Stefan, xxiv 
M-PSK, 274, 410-414, 418-419, 588-590 
multi-dimensional hypothesis testing 
M-ary, 421-427 
binary, 390-396 
multi-hypothesis testing, 404-429 
multiplication by a carrier 
doubles the bandwidth, 105 
FT of the result of, 105 
multivariate Gaussian, 454-493, see also 
Gaussian 


N 
N, 1 
Narayan, Prakash, xxiv 
nearest-neighbor decoding, 410, 411, 423, 
424 
Nefedov, Nikolai, xxiv 
noncentral chi-square distribution, 352-356 
noncoherent detection, 613-631 
normal distribution, see Gaussian 
n-tuple of vectors, 28 
nuisance parameter, see random parameter 
number of nearest neighbors, 670 
Nyquist pulse, 189 
Nyquist’s Criterion, 185-197 


O 
observation, 361, 405 
one-to-one, 172 
open subset, 289, 289n 
operational PSD, 245-264 
and the PSD, 552 
definition, 250-252 
of a CSP, 315 
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of PAM, 252-264 
of QAM, 315-320 
uniqueness, 252 
optimal guessing rule, 362, 405 
orthogonal 
binary tuples, 683 
real passband signals, 126 
signals, 32 
orthogonal keying, 590-593 
noncoherent detection, 613-631 
orthogonal matrix, 458 
orthonormal 
basis, 36-48 
construction, 45 
definition, 37 
existence, 43 
tuple, 36 


P 
packet, 639 
pairwise independence, 476 
pairwise sufficiency, 435-439 
Paley-Wiener, 95-96 
PAM, 176-184, 220-244, 634-642 
baud period, 177 
baud rate, 177 
constellation, 177-181 
definition, 177 
minimum distance, 178 
normalization, 178 
number of points, 178 
second moment, 178 
detection in white noise, 634-642 
digital implementation, 182 
energy, 220-223 
energy per bit, 222 
energy per symbol, 222 
operational PSD, 252-264 
power, 223-244 
pulse shape, 177 
spectral efficiency, 266 
parity-check matrix, 659 
Parseval’s Theorem, 72, 115 
Parseval-like theorems, 67-69 
passband signal, 101-141 
analytic representation of, 109, 135 
definition, 103 
energy-limited, 101, 130-139 
bandwidth around a carrier, 104 
baseband representation of, 136 
characterization, 131, 133 
definition, 103 


is bandlimited, 133 
sampling, 161-168 
through BPF, 134 
integrable, 101 
analytic representation of, 110 
bandwidth around a carrier, 104 
baseband representation of, 116 
characterization, 103 
definition, 103 
inner product, 114 
is bandlimited, 104 
is finite-energy, 104 
through stable filter, 104 
sampling, 161-168 
periodic extension, 686 
periodically continuous, 686 
phase shift keying, see M-PSK 
picket fences, 96-98 
picket-fence miracle, 96 
m/4-QPSK, 337 
Plackett’s Identities, 492 
Plancherel’s Theorem, 72 
Pointwise Sampling Theorem, 151, 163 
for passband signals, 165 
Poisson distribution, 355 
Poisson summation, 96-98 
positive definite function 
from R to C, 199 
from R to R, 521 
from Z to C, 300 
from Z to R, 212 
positive definite matrix, 461 
positive semidefinite matrix 
complex, 304, 305, 507 
real, 461 
power 
in baseband and passband, 311, 320- 
327 
in PAM, 223-244 
in QAM, 310-314 
of a SP, 238 
power spectral density, see PSD 
Price’s Theorem, 492 
prior 
definition, 361, 404 
nondegenerate, 361, 404 
uniform, 361, 404 
probability density function, 247 
probability of error 
binary hypothesis testing 
Bhattacharyya Bound, 373 
general decision rule, 366 
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in IID Gaussian noise, 393-395 
in white noise, 586, 587 
no observables, 362 
noncoherent, 626-627 
optimal, 367, 407 
multi-hypothesis testing 
M-PSK, 412-414 
8-PSK, 590 
bi-orthogonal keying, 599 
in IID Gaussian noise, 425-427 
no observables, 406 
noncoherent, 630-631 
orthogonal keying, 592 
simplex, 596 
Union Bound, 416-419 
Union-Bhattacharyya Bound, 419-— 
421 
probability space, 3, 201 
processing, 376-381, 409 
projection 
as best approximation, 145 
onto a finite-dimensional subspace, 40 
onto a vector in Le, 34 
onto a vector in R?, 34 
onto an infinite-dimensional subspace, 
159 
Prolate Spheroidal Wave Functions, 157 
proper 
complex Gaussian RV, 500 
complex Gaussian vector, 505, 507— 
509 
complex random vector, 293-295 
CRV, 287-288 
discrete-time CSP, 298 
PSD 
of a continuous-time SP, 523, 552-554 
of a discrete-time CSP, 300 
of a discrete-time SP, 213-218 
pulse amplitude modulation, see PAM 
pulse shape 
in PAM, 177 
in QAM, 268 
Pythagoras’s Theorem, 32 
Pythagorean Theorem, 33 


Q 
QAM, 265-282, 307-338, 642-649 


bandwidth, 270 

baseband representation of, 267 
baud period, 268 

baud rate, 268 

constellation, 274 


definition, 274 
minimum distance, 274 
M-PSK, 274 
number of points, 274 
second moment, 274 
square 4-QAM, 274 
detection in white noise, 642-649 
energy, 307-310 
energy per bit, 310 
energy per symbol, 310 
inner products, 275-280 
operational PSD, 315-320 
power, 310-314 
pulse shape, 268 
spectral efficiency, 273-274 
symbol recovery, 275-280 
Q-function, 344-348 
QPSK, 274 
quadrature amplitude modulation (QAM), 
see QAM 
quadrature component, 121, 122, 137 
of energy-limited signal, 137 
of integrable signal, 122 


R 
R, 1 
radially-symmetric function, 495 
Radon-Nikodym Theorem, 405n 
raised-cosine, 196 
random function, see stochastic process 
random parameter, 396-398, 449-451, 617 
and white noise, 613-631 
random process, see stochastic process 
random variable, 3, 201 
random vector 
characteristic function, 468-469 
covariance matrix, 464 
finite variance, 464 
randomized decision rule, 368-370, 408 
randomized guessing rule, see randomized 
decision rule 
rate, 173 
in bits per complex symbol, 172, 268 
in bits per real symbol, 172 
Rayleigh distribution, 354 
real dimensions per second, 177 
real passband signals 
analytic representation, see analytic 
representation 
baseband representation, see baseband 
representation 
condition for orthogonality, 126 
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Sampling Theorem, see Sampling The- 


orem 

real positive semidefinite matrix, 461 
real signal, 2 

real symbols per second, 177 
real-valued signal, 2 

reflection, 3, 53, see mirror image 
repetition code, 683 


representation of an encoder by a matrix, 


658 

Rice distribution, 355 
Riemann 

integrable, 4 

integral, 4, 6 
Riemann-Lebesgue Lemma, 690 
Riesz-Fischer Theorem, 153 
Rimoldi, Bixio, xxiv 
Rockefeller Foundation, xxv 


Ss 
Saengudomlert, Poompat, xxiv 
sample function, see sample-path 
sample of a stochastic process, 202 
sample-path, 201 
sample-path realization, see sample-path 
sampling as an isomorphism, 156 
Sampling Theorem, 75, 143, 148-157 
for passband signals, 161-168 
isomorphism, 156 
Lo, 151 
pointwise, 151 
Sandwich Theorem, 154, 154n 
Sanjoy, Mitter, xxiv 
Sason, Igal, xxiv 
second-order stationary, see WSS 
self-adjoint matrix, 284 
self-similarity function 
of energy-limited signal, 186-188 
definition, 186 
properties, 186 
of integrable signal 
definition, 198 
FT of, 198 
set of Lebesgue measure zero, 7, 9 
Shannon, Claude E., xvii, 171 
Shrader, Brook, xxiv 
o-algebra 
generated by a SP, 514 
generated by RVs, 364n 


generated by the cylindrical sets, 514 


product, 238, 238n 
signal 


complex, 14 
real, 14 
signature, 243 
simplex, 593-596 
simulating observables, 441-443 
sinc(-), 75 
definition, 60 
FT of, 70, 76 
single parity check code, 657 
Singleton Bound, 681, 685 
singular covariance matrix, 466—468 
Slepian’s Inequality, 492 
soft decisions, 681 
SP, see stochastic process 
span, 29 
spectral efficiency, 266, 273-274 
Spectral Theorem, 460 
stable filter, 58 
standard complex Gaussian 
random variable, 494—495 
and properness, 495 
definition, 494 
density, 494 
mean, 495 
variance, 495 
random vector, 502 
covariance matrix, 502 
definition, 502 
density, 502 
mean, 502 
proper, 502 
standard deviation, 342 
standard Gaussian 
complex vector, see standard complex 
Gaussian 
CRV, see standard complex Gaussian 
random variable 
CDF, 348 
definition, 339 
density, 339 
moments, 351 
random vector 
covariance matrix, 470 
definition, 454 
density, 469 
mean, 470 
standard inner product, 392n 
stationarization argument, 257 
stationary 
continuous-time SP, 516 
discrete-time CSP, 297 
discrete-time SP, 208, 209 
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stochastic process, 171, 201-207 
centered, 203 
complex, see CSP 
continuous-time, 512-561 
autocovariance function, 517, 520- 
522 
average power, 528-530 
bandlimited, 252 
bandwidth, 252 
centered, 512 
covariance stationary, see WSS 
definition, 204 
FDD, 512-515 
filtering, 546-552 
finite variance, 512 
finite-dimensional distribution, see 
FDD 
Gaussian, see Gaussian 
independence, 515 
linear functionals, 530-545 
measurable, 529 
path, 512 
PSD, 523, 552-554 
realization, 512 
sample-function, 512 
sample-path, 512 
second-order stationary, see WSS 
spectral distribution function, 525— 
528 
state at time-t, 512 
stationary, see stationary 
strict-sense stationary, see station- 
ary 
strongly stationary, see stationary 
time-t sample, 512 
trajectory, 512 
weakly stationary, see WSS 
wide-sense stationary, see WSS 
WSS, see WSS 
definition, 203 
discrete-time, 208-219 
autocorrelation function, 211 
autocovariance function, 211-218 
covariance stationary, see WSS 
definition, 203 
one-sided, 204 
power spectral density, 213-218 
second-order stationary, see WSS 
spectral distribution function, 217— 
218 
stationary, see stationary 


strict-sense stationary, see station- 
ary 
strongly stationary, see stationary 
weakly stationary, see WSS 
wide-sense stationary, see WSS 
WSS, see WSS 
finite variance, 203 
measurable, 238 
power of, 238 
zero mean, 203 
strict-sense stationary, see stationary 
strictly stationary, see stationary 
strictly systematic encoder, 659 
strongly stationary, see stationary 
subspace, 28, 143 
closed, see closed subspace 
finite-dimensional, 29, 143 
basis for, 29 
dimension of, 30 
having an orthonormal basis, 40 
projection onto, 40 
infinite-dimensional, 29, 143 
sufficient statistics, 381-389, 430-453 
and computability of the a posteriori 
law, 386, 431 
and noncoherent detection, 616-621 
and pairwise sufficiency, 435-439 
and random parameters, 449 
and simulating observables, 441-443 
and the likelihood-ratio function, 383 
factorization criterion, 433-435 
for detection in additive white noise, 
567-572 
in binary hypothesis testing, 381-389 
Markov condition, 439-440 
observation SP, 563-567 
PAM in white noise, 635-642 
QAM in white noise, 642-649 
random parameters and white noise, 
617 
superposition, 3, 20, 26 
support 
compact, 330 
of a PSD, 329 
symmetric matrix, 284 
symmetric random variable, 217 
symmetric set difference, 565 
symmetry property (of FDDs), 513 
systematic encoder, 659 
systematic single parity check encoder, 657 
Szego’s Theorem, 304 
Sznitman, Alain-Sol, xxiv 
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T 
Taylor Series, 614 
Tchamkerten, Aslan, xxiv 
Telatar, I. Emre, xxiv 
tie, 368, 406 
Tinguely, Stephan, xxiv 
Toeplitz matrix, 304 
total positivity of order 2, 355, 491 
trajectory, see sample-path 
transforming 
complex random variables, 289, 291 
complex random vectors, 296, 305 
real random vectors, 290 
transpose (of a matrix), 284 
Triangle Inequality 
for complex numbers, 6 
for signals, 30 
for stochastic processes, 320 
trigonometric polynomial, 686 
tuple 
of bits, 173 
of signals, 28 
turbo-codes, 682 


U 
uniformly continuous, 55, 55n 
Union Bound, 414-421 
Union-Bhattacharyya Bound, 419-421 


Union-of-Events Bound, see Union Bound 
see also 


univariate Gaussian, 
Gaussian 


339-359, 


Vv 
Varshamov Bound, 681 
vector space, 27 
Verdu, Sergio, xxiv 
Viterbi Algorithm, 649 
Vontobel, Pascal, xxiv 


Ww 
Wagner’s rule, 667 
Wang, Ligong, xxiv 
weak convergence 
of random variables, 357 
of random vectors, 488 
weakly stationary, see WSS 


Weierstrass’s Approximation Theorem, 696 


weight enumerator, 670 
wheel-of-fortune, 496 
white Gaussian noise, xix, 554-558 
definition, 555 
detection in, 562-612 


in passband, 558 
properties, 555 
white noise, see white Gaussian noise 
white noise paradigm, 605 
whitening filter 
definition, 600 
existence, 604 
Wick’s Formula, 486 
wide-sense stationary, see WSS 
Wiener-Khinchin Theorem, 257, 552 
Wigger, Michele, xxiv 
worst-case performance, 628 
WSS 
continuous-time SP, 517 
discrete-time CSP, 297, 298 
discrete-time SP, 209-218 


Y 
Young’s Inequality, 61 


Z 
Z,1 
Zeitouni, Ofer, xxiv 
zero padding, 173 


zeroth-order modified Bessel function, 
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